PDA

View Full Version : Multiple primary key question



atoor
12-14-2007, 09:48 AM
I am using the Pentaho MetaData editor version 1.6.0.GA.162 and I have a question about how to get it specify a query in the correct way. The situation I have is as follows:

I have 5 database tables that look like

1821

The primary key from Table4 is a foreign key on all other tables, and forms a part of the primary key for each table.

When I use the query editor and add columns from only Table1 and Table3, it generates a query that only specifies the link between Table1->Table4 and Table3->Table4. I need the query to also add in the relationships from Table1->Table2 and Table3->Table2.

Additionally, if I have columns from only Table5 and Table3, it generates a query that only specifies the link between Table5->Table4 and Table3->Table4. I need the query to also add in the relationships from Table5->Table1, Table1->Table2 and Table3->Table2.


How can I specify my metadata to have this happen?


Thanks!

-Andeep Toor
________
REINSURANCE ADVICE (http://www.insurance-forums.org/reinsurance/)

wgorman
12-14-2007, 12:41 PM
Hello Andeep,

Individual relationships in Metadata are considered independent, MQL will search for the shortest relationship path between two tables to determine the joins and non-included tables to add to the SQL.

I'm interested in looking at your database schema, there may be another way to represent the relationships either via views or with another relational design approach.

Will

atoor
12-14-2007, 01:04 PM
Will

I don't believe that I can send the database schema because it is of a sensitive nature, but I can answer any questions that you might have about it.

We are currently using views. In the example that I provided before fields from Table3 are in View3 and fields from Table1 and in View1, etc.

For finding the shortest path, as you mentioned, is there any way to specify that a path needs to be included, but is not on the critical path?

As I mentioned before Table4.ID is the primary key for Table4, but also makes up part of the primary key for all other tables.
For example, Table1's primary key would be something like:
PRIMARY_KEY(Table1.ID, Table1.TABLE_4_ID)

Table5's primary key is more complex, like
PRIMARY_KEY(Table5.ID, Table5.TABLE_5_ID, Table5.TABLE_4_ID)

In addition, joins between tables are complex, as they involve these multiple primary keys:

Table1->Table2 is (Table1.ID = Table2.TABLE_1_ID AND Table1.TABLE_5_ID = Table2.TABLE_5_ID)

-Andeep
________
HONDA 1300 HISTORY (http://www.honda-wiki.org/wiki/Honda_1300)

MattCasters
12-14-2007, 01:30 PM
The whole idea behind BI in general and Pentaho metadata specifically is to make it as easy as possible and as fool-proof as possible for the user to create reports and get information out of the data.

Creating wacky data model and circular references kinda conflicts with that idea.
What I mean by that is that if you can't put all of this into a single model without creating problems for yourself or PMD, it's not going to be trivial for the user either. Look at the mess that BO for example made out of that problem.

So if that's what it takes to make it fool-proof, create 2 business models in the same domain. Do whatever you need to do to make it simple and fool-proof for the user. While you're at it, why not create a real data warehouse.

You'll find that your problems will go away.

Matt

atoor
12-14-2007, 02:13 PM
Matt

The database structure, in our case, is not something we have control over. It is what it is, and we are tasked to use it. Given that it is fairly wacky, the database structure is not totally out of the ordinary, as this specific situation involves tables having multiple fields that make up a single primary key. To my knowledge, this is something that occurs in many RDBS designs...it may not be the best idea in the world, but it does happen, and I'm hoping that Pentaho can support this structure.

I understand your point of view, especially if we had control over how things are structured, but in the case that we have does Pentaho not support what we are trying to do?

In particular, I took your suggestion of trying to create a separate business model, but I don't see where that gets me. We still need to select columns from a number of views and have the correct sql get generated. Correct me if I am wrong, but I thought that the columns that you can pick from to generate a query would have to be in the same business model?

Is there anything else we can try?
________
Electric Cigarettes (http://vaporizer.org/forum/electronic-cigarettes/)

MattCasters
12-14-2007, 03:17 PM
Your point of view is noted, however, using 3rd normal form databases to do business intelligence on *is* indeed something I consider out of the ordinary.
Since you do have read access to the tables, you *can* in fact create a data warehouse on the data.

But back on topic, to the best of my knowledge, you can indeed specify composite keys between 2 tables.
IIRC it's the "complex join" option.

Matt

atoor
12-14-2007, 03:42 PM
Matt

Yes, we are already using complex joins option when specifying relationships. I appreciate the data warehousing points that you bring up, but in our case it may not be a viable option.

Is there another way we should be specifying our relationships to get a query to generate correctly? Is there anything else we can try?

I am currently looking into the BusinessModel class in both the createSQL and findShortestPath methods (which are called from the MQLQuery class when generating a query). Is any way to get the code to recognize the distinction of a table relationship that should be added to a path, but should be ignored in terms of figuring out when the path is created? The path I am referring to is the collection of relationships that will eventually be used to create a sql string.

To note, I have gone back and tried some more simple queries and have found what appears to be a similar problem. In this simple example I removed the link to Table4 to test stuff out temporarily. It looks like this:

1823

I am doing a query that selects from a column from Table5 and one that selected from Table3. The query I get back includes joins from Table1, Table2, Table3, but also erroneously includes Table6's relationship. Why is this happening? Shouldn't this get removed as it shouldn't be a part of the shortest path?

To correct myself from the first post as well, I didn't mention that in all the queries I was doing, I was filtering on fields from Table4.

I hope this helps towards solving this issue. Please let me know if there is more info you require.
________
Honda XL250 (http://www.honda-wiki.org/wiki/Honda_XL250)

MattCasters
12-15-2007, 08:31 PM
There is a property called "Relative size" that will allow you to do scoring on the paths.

If there are multiple paths: A-B-C, A-D-C, A-E-C for example, you can set a relative size of B=5, D=3, E=7
As such, the score of the A-D-C path will be the lowest and taken.

The path of the least resistance if you can put it like that.
See if that helps you out in this situation.

Matt

atoor
12-17-2007, 09:20 AM
Matt

Is this a property that I can set on a table in the MetaData Editor, or is this something I can only manipulate in the Pentaho classes themselves?

Thanks,

-Andeep
________
Bmw M40 History (http://www.bmw-tech.org/wiki/BMW_M40)

MattCasters
12-17-2007, 11:42 AM
The former:

1837

atoor
12-17-2007, 01:00 PM
Matt

Thanks for the info... I will give this a try. Looking carefully at your explanation, however, this seems like this will help only in the case where two paths are equal in length. Is this correct?

If so, is there a way, instead, to assign a weight to the relationship instead of the table? If there was, I could see being able to specify that a path that includes a certain relationship (even if it is longer than other paths) is more favorable.
________
Vaporizer Affiliate Programs (http://vaporizeraffiliateprogram.com)

MattCasters
12-17-2007, 02:18 PM
We've been having a couple of design sessions on the topic this week.
It looks like that requirements such as yours are so far out of bounds that it's more of a round peg in a square hole story.

However, architecturally the idea is to provide you with API hooks (factories) that will allow you to add properties to the model that you can parse in your own SQL generating engine.
In other words: I don't think there is any BI tool out there that does what you want (automatically include other non-selected tables) and we are not really interested in supporting it, but in the true open source spirit, we don't really want to exclude the possibility.

Matt