Hitachi Vantara Pentaho Community Forums
Results 1 to 14 of 14

Thread: To feed a fact table

  1. #1
    Join Date
    Aug 2008
    Posts
    6

    Exclamation To feed a fact table

    Hi all,

    I'm using Kettle (Spoon) for the first time, and i have already realize some work on this tools. But i encounter a problem that i don't understand, and i'm unable to solve even with topic of this forum.

    I'm working with mySQL v5.1 and kettle v3.1.

    I have to feed a fact table with 8 another table that i have already succed to feed. These 8 tables are feed with flat files.
    My problem is that i don't know how to feed my fact table. I have tried to do it in 1 transformation.
    in that transformation, the 8 tables are feeding my fact table, but when i connect a second or more table, it advert me that this will not work... To pass througt that, i have tried a cross mapping, but nothing is working...
    And when i have launch one time, i must restart Spoon to re-launch the transformation and test it...

    I hope you will understand my problem, sorry my english is not very good. if you need more information, ask plz, i need your help. My feeding is very simple, but i don't understand the error of Kettle.

    Thanks a lot !

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    For the relaunching, that's a bug... a JIRA was raised for it some time ago: http://jira.pentaho.com/browse/PDI-1598

    Tutorial at http://wiki.pentaho.com/display/EAI/...ttle)+Tutorial

    For what you want to do... it depends... you could possibly write 1 big SQL statement in a table input step that does all the work in the database and put the data like that in the fact table (preferred way for me if possible).

    If you need to do strange stuff with your fact table data you may need multiple transformations.

    You can think of steps as data enrichments ... you pass a hop through it and you can tag extra data to each row (providing the step can handle it).
    You could e.g. do 1 table input step and then 7 database lookups (assuming you have the key) to get at your input data, ...
    What you probably tried is 8 database lookups after each other in sequence, and that doesn't work of course.

    It all depends on your specific requirements.

    Regards,
    Sven

  3. #3
    DEinspanjer Guest

    Default

    I believe the problem you are describing is that your table schemas are not identical.
    In Kettle, you can't connect streams with different field layouts without first conforming them.
    To conform them, just use the Select Values step to select a common subset.

  4. #4
    Join Date
    Aug 2008
    Posts
    6

    Default

    Thanks for your reply !

    I'm trying tio do something extremely simple ! But i don't understand how Kettle take in charge these act.

    My fact table generate his own primary key, after she has 8 another foreign key who all comes from each table. There is no interaction between data or fields on these tables. (So lookup will not be helpfull)

    And when i try to connect directly the tables with my fact table, Kettle prevent me this will not work... I don't know how to handle this... I was thinking that a cross mapping will solve this, but it do the same warning... Is there an object who can permit me to do what i'm looking for ?

    And i repeat, there is no real difficulty in my star model, it's only that i'm lost in Kettle possibilities. Need some help to choose the right tools ^^

    Thanks for answers !

  5. #5
    Join Date
    May 2006
    Posts
    4,882

    Default

    Well... How do you know which foreign keys from each of the 8 tables belong together?

    There's nothing explicit in Kettle for star schema's... if you would be assuming that you throw some steps on your canvas in a star schema format and that that would work

    Regards,
    Sven

  6. #6
    DEinspanjer Guest

    Default

    Normally, if you have a fact table that has foreign keys to eight other tables, those other tables would be dimension tables.
    So to be able to link these foreign keys in, the main data stream would contain the natural keys for those dimensions in it. e.g. If you have a date dimension, your main data stream might have a yyyyMMdd date field in it. You would use that natural key to look up the technical key. In Kettle you would do this using either the Dimension lookup/update step or the Combination lookup/update step.

    Given the slight language barrier, it would probably be easiest if you could attach the transformation you are currently having trouble with. Being able to see what you currently have configured would probably let us figure out how to offer assistance even if we can't run it.

  7. #7
    Join Date
    Aug 2008
    Posts
    6

    Default

    I'm using foreign key who are the primary key from my 8 tables. But in these tables, there is the natural key, but i don't use it in my fact table...

    Here is my transformation, If i can help to understand....

    Thanks for help !
    Attached Files Attached Files

  8. #8
    DEinspanjer Guest

    Default

    Hrm.. Okay. This isn't really helping me understand what you are trying to do.
    You are queying for just the ID out of these 8 tables. You aren't querying for anything that would allow you to determine how the rows relate to each other.

    With just the keys from the 8 tables, the only thing you could do would to be a cross join or Cartesian join of them. If you had just 10 rows in each of these tables, that would generate over 100000000 rows in your result table.

    Are you sure this is a fact table you are trying to populate? Usually, a fact table contains rows that each relate to one specific fact about your data. E.g. "this is the number promotions that were published in this magazine on this date." That fact would come from somewhere external like a publication list or an invoice or something.

    I guess the next thing that we need to try to decipher what you are doing would be sample data. If you could give us a sample of three rows from each of the eight tables and what you would expect the resulting output to look like in the output table we'll see what we can figure out.

  9. #9
    Join Date
    Aug 2008
    Posts
    6

    Post

    I'm working on a modelisation which is very simple. They are only used to help in formation, as exemple... But i need to obtain my result with these data. I can't change them or table structure.

    Here you will find all tables and data, i'm allowed to give it. i hope this will help you to help me... ^^

    Thanks for help !
    Attached Files Attached Files

  10. #10
    Join Date
    Aug 2008
    Posts
    6

    Default

    erf... sorry if i have done a mistake, if you need something else, tell me !
    I really want to finish this part, and althougth it's not something very well construct in decisionnal term, i don't have choice, i must find a way to build this fact table without change the strucure.

    Thanks for help !

  11. #11

    Default

    Hi Kettles,

    as you said, it seems to me quite simple.
    The source of your big fact table is your FaitVentes.csv
    (But you did not include the header which contains the names of the columns. And it seems that you have another table of facts, isn't it?)

    I will explain it for first line of FaitVentes.csv.
    899811;2922;1;3;235;10;64;1;589;1432;8

    899811 is "something" that identifies ... lets say a person.
    2922 is something else that identifies ... lets say a car.
    ... (continued)

    To get a fact table you are interested in translating the somthing (899811) into the foreign key of the person (maybe the credit card number).
    Also for the 2922, you want to translate it into the key for the cars...

    If you translated all the "something"s into keys the tuple is a fact. A fact does only reference some keys of the dimensions (here person and car). It can contain some attributes describing the relationship (between car and person, e.g. the price paid).

    Did you understand that? It is the typical way of a fact table.
    Maybe you have got it another way (which we others do not know)?

    And ask if you have more questions.

    Greets, Christoph

  12. #12
    DEinspanjer Guest

    Default

    Quote Originally Posted by Kettles View Post
    erf... sorry if i have done a mistake, if you need something else, tell me !
    I really want to finish this part, and althougth it's not something very well construct in decisionnal term, i don't have choice, i must find a way to build this fact table without change the strucure.

    Thanks for help !
    I need to know what output you are expecting from these inputs. What table are you wanting to populate and what are the criteria you are wanting to populate it on?
    At the moment, based on what you said before about just combining all the different foreign keys, you'd be looking at 190 quadrillion rows for a Cartesian join. I'm hoping that isn't what you want, cause if it is, you're going to need a bigger machine.

  13. #13
    Join Date
    Aug 2008
    Posts
    6

    Default

    Ok Thanks all for answering...
    I will try to explain better... The fact table named "faitventes", i need to populate it with information wich are inside others table. In files i have give you, the fact table is already full with information, but i have to truncate it and found a solution, with kettle, to populate it with the help of others files...
    I know there is missing some "rules" to populate the table, no where i have found something who relies information...

    For the moment, i don't try to have some correct data inside the fact table. And i don't want also finish with 1000000000 rows in my table.

    I want only found a correct way to populate the fact table. I think we can call this in english, a brutal and stupid combining a foreign key...

    i hope i have been more clear. if i don't, ask me !

    Thanks for help !

  14. #14
    DEinspanjer Guest

    Default

    Unfortunately, it really just doesn't work that way. Fact tables don't get built from dimension tables, they get built from some fact source and they either lookup or insert data into the dimension tables.

    What you would normally see is a feed similar to your FaitVentes.csv file, but instead of the FKs it currently has, it would have natural keys. The ETL process would read in that csv, look up the natural keys in each of the dimensions to discover the FK and then insert the FK into the stream. At the end of the ETL, it would load that stream into your FaitVentes table. If you want an example of doing that, I'd suggest you look at the document Building-data-warehouses-using-open-source-technologies in the Kettle docs/English directory.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.