PDA

View Full Version : Tips for working off de-normalized source data



Mike Deutsch
01-24-2008, 06:38 PM
Hi,

Can anyone offer advice on how to get off the ground w/ Metadata built off a single de-normalized db view, rather than a star schema?

My data source is a data warehouse view, already normalized and ETL'd off of a dozen or so star-schema source tables. To use Pentaho's Steel Wheels metaphor, you could say that I've collapsed my star schema of Products, Customers, Territories, Sales, etc. into a many-column view. This was done for local db reasons -- not to say that it would have to stay that way.

If I materialize the view into a table, I'm able to connect from the Metadata Editor and "import the physical table" into my model. I imagine I can proceed from there as usual, just creating multiple Business Tables from this one Physical Table, and using the columns of a single dimension for each Biz Table. It's not documented, but it seems like it would work. From there I have some questions.... If anyone can suggest a better way to address these, I would appreciate the input.

1) Suppose I want to clean up these identical Biz Tables, because each only uses a handful of the 50+ columns I see in the Properties window. To do this I would right-click and Edit each Biz Table and remove the columns that I don't want to use. Is there a way in the GUI to short-cut this so I don't have hundreds of clicks? Or can you create each Biz Table with only the columns you want, not the full list from the Physical Table?

2) Are my metadata model and all the Pentaho components built on top of it going to be inefficient because they are built on a denormalized view, rather than a star schema? Are there concrete reasons to re-normalize my source data just for use in Pentaho, if it has already been de-normalized for other reasons?

Input and suggestions (or questions) are appreciated.

Thanks,
Mike

MattCasters
01-29-2008, 06:42 AM
It's not documented, but it seems like it would work.

I checked (http://wiki.pentaho.org/display/studio/00.+Metadata+Terminology#00.MetadataTerminology-BusinessTable), and indeed it's not mentioned specifically. That is rather unfortunate since this one-to-many mapping is exactly at the core of the whole metadata system.

1) No, not yet. Please do create JIRA cases for these feature requests. I'll keep an eye on them in the coming months. We're re-doing a lot of stuff at the moment actually.

2) For reporting a de-normalized (multi-dimensional or star-schema) model is most efficient. Not just that, security is typically also a lot easier to implement, it's easier to extend, has more re-use potential, etc.

All the best,

Matt