Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Best way to handle many parallel transformations in a job

  1. #1

    Default Best way to handle many parallel transformations in a job

    I am building a Data Vault load with PDI. Since Hubs first, and then Links and Satellites can be loaded in parallel or at least in random order, I have jobs that contain a START and SUCCESS with a lot of transformations between them next to each other and not serial. The Data Vault keeps expanding, so it's becoming unreadable. What is the way to handle this efficiently? I love to make it compact.

  2. #2

    Default

    Anybody from the 100 viewers?

  3. #3
    Join Date
    Feb 2017
    Posts
    7

    Default

    my 2 cents.. If I understand your question, you are looking for readability of your code as your DV keeps growing, correct ?
    You can put three or four ktrs in a single parallel path and on each ktr you can specify run in parallel and unconditional.

  4. #4

    Default

    If you have multiple transformations that should run in parallel, you might put those into a sub-job (e.g. a "Hubs" sub-job with all of the "Hubs"-related KTRs) and have those KTRs run in parallel. You can then build sub-jobs for Links and Satellites, so you top-level job would just be "Start > Hubs > Links > Satellites > Success".


    Just remember that you're creating a new thread for each step-copy, so the more parallel transformations you have running, the more-likely you are to have thrashing of the threads across the CPU cores, which could lead to lower performance.

  5. #5
    Join Date
    May 2014
    Posts
    358

    Default

    Try thinking in the direction of reusable, metadata-driven transformations. Then maybe you can have one transformation running in multiple threads on different tables. Maybe you need to wrap a job executor in a transformation to do it.

  6. #6

    Default

    I already have sub jobs already. This is no longer sustainable as I have 30+ hubs, links and sats (per type). And I have about 90 staging tables of one source to fill. Is there a way to get a list of names to feed to a transformation step in a join, so that it runs those? I have looked for that, but fail to see it. I also miss that as metadata option.

  7. #7
    Join Date
    May 2014
    Posts
    358

    Default

    I don't understand your question.
    What I meant is this. In a transformation, get the list of tables you want to stage (from a configuration file, or somehow derive this information from the DB catalog of your DWH and the source). Send this to "Copy rows to result" step. In a job, configure the next transformation to execute for each input row, set up the transformation with a parameter and copy the table name to the parameter.
    I have once done a prototype like this. Had some quirks, though, so in the end we didn't use it in production
    I generated the SQL query and passed it to the parameters along with the destination table name and the primary key field name. There were three transformations in total:
    #1 prepares the list of tables
    #2 generates the SQL query and sends it via Metadata Injection to #3
    #3 does the actual work

    This would run sequentially. To make it run in parallel, you'd need to shoehorn the "job & job executor trans-step" pattern I suggested earlier into that mix. Or if you have the single-purpose transformations, just pass their names to that, then you can set the job executor step to run in multiple copies.
    Stay away from the Transformation executor step u̠͇n͠l͓̣͕̞̟̤̀e̖͜s̜͍͡s҉͕̩͕̼͔ ͓̖̣̼̩̲y̗̜̻̬͡o͈̞̣̠͘u҉̲̹ ̡̞ẃ̻̙͉̼̬̪̤a͏̪͈̥͔͖͓̺n̳̙͍̖̙͡t̖̠̲́ ̜̰͓s̠̦̫̤ͅo̡͙̰͓̼ͅm̹͇̞eţ̱͓̼͎̣h̹̹̯͜i̛͙͇͙̠̬n͖̥̙͈̤̼͍̕g̪̹̭̬̪ b̼̞̩̙͕a̘͕̭̤̠d̝̦̜̞̠̟̀͠ͅ ͇͍͈̳̣̹͓͇h̶̕҉̺à̢̦̭͉͙p̝̫̣͍̫̞̦̕p̪̹̭͢e̢̛̺̪͈͚͕̙͇͕͕͢n͖̦͚͔ ̭̟ͅt̶̡̥̻̩̰̮̻͉͔͞ͅơ̴̥̦̰͎̰̙̱ ̴̢̖̰̘̲͎́y͔͎̯̭͡o͏̘̪͚̩ṷ͜.̭͙̮̹

    _

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.