PDA

View Full Version : [Mondrian] Mondrian Roles and Cell Level Access - A Post Mortem



Richard Emberson
05-14-2007, 10:21 AM
Mondrian Roles and Cell Level Access - A Post Mortem


Current Mondrian allows one to permission a Hierarchy's leaf level
members. Queries run at the leaf level will then only display
those permissioned members. Queries at higher levels, that is, at
aggregate levels, with such a permissioning, the results do not
reflect the leaf level permissioning.
Rather, all leafs are aggregated into the generated results.
For some, this is the behavior that is desired, but, for others,
this is viewed as a security hole - a user should only see what
they are permissioned to see even at higher, aggregate levels.
I have just such a customer requirement.

Last month I attempted to patch Mondrian to address this requirement.
A further requirement is that a given dimension can have up to 1
million leaf members.
While these million leaf members will not have the same parent member,
one could have tens of thousands of leaf members that do have the same
parent. What is important about this requirement is that it
pretty much says that one can not enumerate the permitted member names
in some SQL clause - there are just too many.
Most customers do not have such large dimensions nor do they have
so many leaf level siblings - FoodMart certainly does not, but it can occur
(I have such a customer).

A further requirement is that performance be comparable to a
non-permissioned
system. This implies that aggregate tables still be used in some
manner and that the in-memory aggregate cache also be used.

The last requirement was that the modifications to Mondrian be small - I
did not want to re-write whole parts of the system.

The Requirements:

1) Aggregate results include only data from permissioned members.
2) Support large number of permissioned members.
3) Performance using aggregate tables and in-memory aggregate cache.
4) Point modifications to Mondrian only.

What I ended up with was a version of Mondrian that fulfilled all the
requirement but the requirement dealing with aggregate tables.
I actually had an earlier version that even allowed the use
aggregate tables but some had issues with the approach I took,
so I backed it out.

Changes to Mondrian code:

Role.CellLevel
First, I extented the Role interface with a new interface called
Role.CellLevel with the following additional methods:
int getId();
boolean hasAggregateTables();
boolean hasAllPermissions();
void initialize(Schema s, DataSource dataSource);

In the initialize() method in the Role.CellLevel implementation class,
for each Hierarchy in the schema that was permissioned,
walked down the Hierarchy's Levels determining the Access value per
Member
for the current user (role).
This approach was taken because our member-role permissions were
stored in the database for leaf-level members only and the permissions
of intermediate level members had to be inferred from that of their
child members.

Also, the Role.CellLevel interface has the following inner (but static)
class:
public class Context {
public static final ThreadLocal<Role> Current =
new ThreadLocal<Role>();
}
This was used to pass the Role down the Mondrian execution stack.
RolapConnection
In RolapConnection's setRole() method, if the Role is an instance
of the
Role.CellLevel, then its initialize() method is called allowing one to
initialize the Role from the Schema and DataSource.

In the RolapConnection's execute() method, the RolapConnection's
current
Role was registered with the current thread by calling:
Role.CellLevel.Context.Current.set(getRole());

RolapStar aggregate cache
Role-based aggregate caching in RolapStar was implemented by
changing all of the caches from using a BitKey as the template Map key
to using an object which contains both the BitKey and the thread's
current Role.
No other changes were required to all that RolapStar aggregate
caching code!!!

Sql generation and schema definition
In the schema definition of a Hierarchy, I added a new element,
"PermissionExpr" (same base type as "MemberExpr") that contained
a where-clause predicate to be used for determining the access
permission of a given role for that Hierarchy. For example, the
Customer Hierarchy might have the template predicate (for mysql):

`FACT_TABLE_NAME`.`FACT_TABLE_FOREIGN_KEY` in (
select `customer_permissions`.`customer_id`
from `customer_permissions` as `customer_permissions`
where `customer_permissions`.`role_id` = ROLE_ID )

Where the key word FACT_TABLE_NAME and FACT_TABLE_FOREIGN_KEY
are replaced with the query's Cube's fact table and the
foreign key to the Hierarchy table and the key word ROLE_ID
is replaced with the current user's role id value prior to
SQL execution.

In the low-level classes where SQL is generated,
mondrian.rolap.agg.AggregationManager
mondrian.rolap.agg.SegmentArrayQuerySpec
mondrian.rolap.agg.AbstractQuerySpec
I made modification so that when the current thread's Role was of type
Role.CellLevel, then, for each Hierarchy in the query that had
the schema template where-clause predicate (in the PermissionExpr
element),
the predicate's key words were replaced and it was added to the
queries generated SQL.

Aggregate Tables
For aggregate tables, I added a new column, called "role_id" to
all aggregates, created three roles with permission tables for
the Product and Customer dimesions and modified the db loading code
so that it created aggregate table rollups for not only the default,
full permission, role; but also the other three partially
permissioned roles.

In the class
mondrian.rolap.agg.AggQuerySpec
the where-clause predicate
role_id = ROLE_ID
where ROLE_ID is replaces with the current user's role id was added
to the generated SQL.

Loading Mondrian
When loading the Mondrian dataset (MondrianFoodMartLoader) the
two test permission tables were created (product_permissions and
customer_permissions), three test user roles created with entries
in the permission tables and aggregate table generated that
included the normal roleup with default role id -1, as well as
roleups for the three test user roles.

It all worked just fine. Aggregate tables were used as needed and the
in-memory cache, cached data per role.

The fact that I had created aggregate table entries per role id
was deemed unacceptable, so I backed out
all of the runtime and db load code associated with the "role_id" column
in aggregate tables and in the AggregationManager. Now, if the Role
was a Role.CellLevel, I forced the use of the base fact table
with role id base permission where-clause expressions - aggregate tables
were no longer used if the user had a role id.

This also worked but was slower. Trying to figure out how to generate
auto-magically the correct constraints based upon current Role's accesses
permissions within the RolapStar code (which may be the correct way
to do it) was too hard to do in the two days remaining before
code freeze - so I punted.

I should mention that in the JPivot MondrianModel initialize() method
I take the MondrianModel's new Role instance variable and if not null,
set the Connection's Role to it. It is the JPivot MondrianModel
that our JPivot/Mondrian container has access to.

A final thought, the approach I took to creating and using
aggregate tables (adding a role_id column, rolling up per role id,
and generating SQL with the where clause predicate "role_id = ROLE_ID")
was deemed "not olap-like". Of course, it certainly is rolap-like since
rolap involves using (leveraging) a relation database to generate olap
results.
Except for the fact that creating all of the additional rollups, one per
role, for the aggregate table - which is a true negative. I view the
distinction as being orthodoxy versus practicality. It would certainly be
good if the role-base constrains could be added to the generated SQL
by the RolapStar (AggStar) code, but if you have a situation where
there are hundred of thousands of leaf members and, say, a given
role is permissioned to access fifty thousand of them (and these leaf
level members can not be summarized as access to their parent(s) members
- a parent member has 100,000 leaf members but only 50,000 are
permissioned for the role in question) then one can not generate
SQL that explicitly list the permitted leaf members; it just does
not scale. Rather if one took a rolap approach and inserted the
PermissionExpr(s) SQL into the generated aggregate table query SQL,
the database would be doing the work and one could have large numbers
of permissioned leaf members for a given role.

Richard


--
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.
_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian

Richard Emberson
05-14-2007, 05:10 PM
One additional item that was discovered to be needed:
Just as in JPivot, one may wish to create one's own
implementation of the Role interface, so to with XMLA.
Trouble is currently the context map created in XmlaServlet
is a Map<String,String> rather than a Map<String,Object>,
the DefaultXmlaServlet expects to extract a role's name
from the context map rather than a Role object,
and the DefaultXmlaRequest takes a Role name rather
than a Role object. It is also true that the code in
RowsetDefinition gets the Role name from the XmlaRequest
rather than the Role object.
Now it could be that both should be supported, but
being able to support a Role object is (at least for me)
a must.


Richard




Richard Emberson wrote:
> Mondrian Roles and Cell Level Access - A Post Mortem
>
>
> Current Mondrian allows one to permission a Hierarchy's leaf level
> members. Queries run at the leaf level will then only display
> those permissioned members. Queries at higher levels, that is, at
> aggregate levels, with such a permissioning, the results do not
> reflect the leaf level permissioning.
> Rather, all leafs are aggregated into the generated results.
> For some, this is the behavior that is desired, but, for others,
> this is viewed as a security hole - a user should only see what
> they are permissioned to see even at higher, aggregate levels.
> I have just such a customer requirement.
>
> Last month I attempted to patch Mondrian to address this requirement.
> A further requirement is that a given dimension can have up to 1
> million leaf members.
> While these million leaf members will not have the same parent member,
> one could have tens of thousands of leaf members that do have the same
> parent. What is important about this requirement is that it
> pretty much says that one can not enumerate the permitted member names
> in some SQL clause - there are just too many.
> Most customers do not have such large dimensions nor do they have
> so many leaf level siblings - FoodMart certainly does not, but it can occur
> (I have such a customer).
>
> A further requirement is that performance be comparable to a
> non-permissioned
> system. This implies that aggregate tables still be used in some
> manner and that the in-memory aggregate cache also be used.
>
> The last requirement was that the modifications to Mondrian be small - I
> did not want to re-write whole parts of the system.
>
> The Requirements:
>
> 1) Aggregate results include only data from permissioned members.
> 2) Support large number of permissioned members.
> 3) Performance using aggregate tables and in-memory aggregate cache.
> 4) Point modifications to Mondrian only.
>
> What I ended up with was a version of Mondrian that fulfilled all the
> requirement but the requirement dealing with aggregate tables.
> I actually had an earlier version that even allowed the use
> aggregate tables but some had issues with the approach I took,
> so I backed it out.
>
> Changes to Mondrian code:
>
> Role.CellLevel
> First, I extented the Role interface with a new interface called
> Role.CellLevel with the following additional methods:
> int getId();
> boolean hasAggregateTables();
> boolean hasAllPermissions();
> void initialize(Schema s, DataSource dataSource);
>
> In the initialize() method in the Role.CellLevel implementation class,
> for each Hierarchy in the schema that was permissioned,
> walked down the Hierarchy's Levels determining the Access value per
> Member
> for the current user (role).
> This approach was taken because our member-role permissions were
> stored in the database for leaf-level members only and the permissions
> of intermediate level members had to be inferred from that of their
> child members.
>
> Also, the Role.CellLevel interface has the following inner (but static)
> class:
> public class Context {
> public static final ThreadLocal<Role> Current =
> new ThreadLocal<Role>();
> }
> This was used to pass the Role down the Mondrian execution stack.
> RolapConnection
> In RolapConnection's setRole() method, if the Role is an instance of
> the
> Role.CellLevel, then its initialize() method is called allowing one to
> initialize the Role from the Schema and DataSource.
>
> In the RolapConnection's execute() method, the RolapConnection's
> current
> Role was registered with the current thread by calling:
> Role.CellLevel.Context.Current.set(getRole());
>
> RolapStar aggregate cache
> Role-based aggregate caching in RolapStar was implemented by
> changing all of the caches from using a BitKey as the template Map key
> to using an object which contains both the BitKey and the thread's
> current Role.
> No other changes were required to all that RolapStar aggregate
> caching code!!!
>
> Sql generation and schema definition
> In the schema definition of a Hierarchy, I added a new element,
> "PermissionExpr" (same base type as "MemberExpr") that contained
> a where-clause predicate to be used for determining the access
> permission of a given role for that Hierarchy. For example, the
> Customer Hierarchy might have the template predicate (for mysql):
>
> `FACT_TABLE_NAME`.`FACT_TABLE_FOREIGN_KEY` in (
> select `customer_permissions`.`customer_id`
> from `customer_permissions` as `customer_permissions`
> where `customer_permissions`.`role_id` = ROLE_ID )
>
> Where the key word FACT_TABLE_NAME and FACT_TABLE_FOREIGN_KEY
> are replaced with the query's Cube's fact table and the
> foreign key to the Hierarchy table and the key word ROLE_ID
> is replaced with the current user's role id value prior to
> SQL execution.
>
> In the low-level classes where SQL is generated,
> mondrian.rolap.agg.AggregationManager
> mondrian.rolap.agg.SegmentArrayQuerySpec
> mondrian.rolap.agg.AbstractQuerySpec
> I made modification so that when the current thread's Role was of type
> Role.CellLevel, then, for each Hierarchy in the query that had
> the schema template where-clause predicate (in the PermissionExpr
> element),
> the predicate's key words were replaced and it was added to the
> queries generated SQL.
>
> Aggregate Tables
> For aggregate tables, I added a new column, called "role_id" to
> all aggregates, created three roles with permission tables for
> the Product and Customer dimesions and modified the db loading code
> so that it created aggregate table rollups for not only the default,
> full permission, role; but also the other three partially
> permissioned roles.
>
> In the class
> mondrian.rolap.agg.AggQuerySpec
> the where-clause predicate
> role_id = ROLE_ID
> where ROLE_ID is replaces with the current user's role id was added
> to the generated SQL.
>
> Loading Mondrian
> When loading the Mondrian dataset (MondrianFoodMartLoader) the
> two test permission tables were created (product_permissions and
> customer_permissions), three test user roles created with entries
> in the permission tables and aggregate table generated that
> included the normal roleup with default role id -1, as well as
> roleups for the three test user roles.
>
> It all worked just fine. Aggregate tables were used as needed and the
> in-memory cache, cached data per role.
>
> The fact that I had created aggregate table entries per role id
> was deemed unacceptable, so I backed out
> all of the runtime and db load code associated with the "role_id" column
> in aggregate tables and in the AggregationManager. Now, if the Role
> was a Role.CellLevel, I forced the use of the base fact table
> with role id base permission where-clause expressions - aggregate tables
> were no longer used if the user had a role id.
>
> This also worked but was slower. Trying to figure out how to generate
> auto-magically the correct constraints based upon current Role's accesses
> permissions within the RolapStar code (which may be the correct way
> to do it) was too hard to do in the two days remaining before
> code freeze - so I punted.
>
> I should mention that in the JPivot MondrianModel initialize() method
> I take the MondrianModel's new Role instance variable and if not null,
> set the Connection's Role to it. It is the JPivot MondrianModel
> that our JPivot/Mondrian container has access to.
>
> A final thought, the approach I took to creating and using
> aggregate tables (adding a role_id column, rolling up per role id,
> and generating SQL with the where clause predicate "role_id = ROLE_ID")
> was deemed "not olap-like". Of course, it certainly is rolap-like since
> rolap involves using (leveraging) a relation database to generate olap
> results.
> Except for the fact that creating all of the additional rollups, one per
> role, for the aggregate table - which is a true negative. I view the
> distinction as being orthodoxy versus practicality. It would certainly be
> good if the role-base constrains could be added to the generated SQL
> by the RolapStar (AggStar) code, but if you have a situation where
> there are hundred of thousands of leaf members and, say, a given
> role is permissioned to access fifty thousand of them (and these leaf
> level members can not be summarized as access to their parent(s) members
> - a parent member has 100,000 leaf members but only 50,000 are
> permissioned for the role in question) then one can not generate
> SQL that explicitly list the permitted leaf members; it just does
> not scale. Rather if one took a rolap approach and inserted the
> PermissionExpr(s) SQL into the generated aggregate table query SQL,
> the database would be doing the work and one could have large numbers
> of permissioned leaf members for a given role.
>
> Richard
>
>


--
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.
_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian