Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Can I make this more efficient?

  1. #1
    Join Date
    Aug 2008
    Posts
    563

    Default Can I make this more efficient?

    Hello,
    I am quite new to PDI. I have been adapting the 'process all tables' example for my needs.

    Let me describe you my jobs and transformations:
    1. We have about 100 tables (for each of our products) that store all the customers that bought that product. The first job and transformation loops through these tables and merges them together into one huge table, which is output into one database table product_sales.
    2. The second job and transformation gets that unique customers from this huge table and adds a customer_id to each customer and output this into a dabase table unique_customers.
    3. In the next job and transformation I read in unique_customers and product_sales, merge them. Now the product_sales table has customer_id and I can drop some other columns/data points.

    My question is: Instead of exporting the data each time into a table, can I not pass on the data to the next job? Also, job 2 has to be done before job 3 and job 3 uses the results from both job 2 and job 3.

    Thanks a lot,
    Diddy

  2. #2
    Join Date
    Feb 2009
    Posts
    296

    Default

    You could also consider doing all that in one transformation.

    Use the combination lookup step or the dimension lookup step to create the unique_customers table - they will also supply you with the ID field.

    This should work pretty well and reduce the database load.
    Fabian,
    doing ETL with his hands bound on his back

  3. #3
    Join Date
    Aug 2008
    Posts
    563

    Default

    thanks a lot for your answer!
    It is not really possible I think to do everything is one step, as the loop job has to finish first, so that I have all products that all costumers bought. Only then I can create the unique customers with id.
    I want this unique customers to be saved to a table, but I don't want to save the result of job 1. Once I have the customer ids, I could update the result set from job 1.

    I will have a look at these lookup stepts that you mentioned and see if they help.

    Best regards,
    Diddy

  4. #4
    Join Date
    Feb 2009
    Posts
    296

    Default

    Do you really need to have all the customers available at the same time?
    I'm thinking of a structure like this:

    * Loop Job
    * Transformation pulls all the customers + sales from one of your product tables
    * Transformation does a lookup against the target database's customers-table
    * If the customer exists it takes the ID (*)
    * If the customer does not exist it inserts it (*)
    * Transformation uses the ID and the sales data to insert into the big target sales table

    (*) This is done by the "Combination lookup/update" step

    If you do not have other transformations to be done all this should be easily done by using the "table input", "combination lookup/update" and "table output" steps. Three steps in the transformation plus the loop job and you should be fine.
    Fabian,
    doing ETL with his hands bound on his back

  5. #5
    Join Date
    Aug 2008
    Posts
    563

    Default combination lookup-update

    Hi Fabian,
    The combination lookup-update is a very interesting and quite useful step. I just had a look at it. I will try to implement this step now. Probably soon I'll have some more questions
    Thanks a lot,
    Diddy
    Last edited by diddy; 03-18-2009 at 06:35 AM.

  6. #6
    Join Date
    Aug 2008
    Posts
    563

    Default Conclusion

    Well, this works perfectly! Amazing! This is really a very very useful step!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.