Hitachi Vantara Pentaho Community Forums
Results 1 to 21 of 21

Thread: Use of loop in pentaho kettle

  1. #1
    Join Date
    May 2008
    Posts
    232

    Default Use of loop in pentaho kettle

    Hi all....
    I have a requirement where i have more than 1000 excel documents in a folder, i need to take values from those excel document from the same sheet(say "sheet1") and same cells(say "B23:K32") of each excel document. For that i need a "for loop" kind of thing so that the same transformation can be run 1000 times just by changing the excel document......
    Is there any method available in pentaho for looping or any other method to do the meet the above requirement.


    Thanks for help in advance...

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Sure but it requires some setup. Maybe we should add an example to the samples directory that processes multiple input files.

    Essentially it would boil down to reading the file list and then per filename execute the actual processing...

    Getting to specific cells in excel is another problem.

    Regards,
    Sven

  3. #3
    Join Date
    May 2008
    Posts
    232

    Default

    Hi Sobden.....

    Thanks for reply

    Access to the specific cells in excel file is not a problem i, it can be done by giving the values in "start row" and "start column" in the "Sheets" tab of "excel input".
    The main problem is looping .. i can't have 1000 transformations to access 1000 different files!!!!
    Can you please tell me in details what changes i will have to do so that my transformation can access multiple files.

    regards

  4. #4
    DEinspanjer Guest

    Default

    Look in the samples/jobs/run_all directory of your kettle installation.
    This set of jobs and transformation gets a list of all .ktr files in a directory, then for each one of them, sets a variable with the file name and executes that transformation using that variable.

    This example demonstrates the mechanism of getting a list of files and doing something with each one of them by running in a loop and setting a variable.

    Don't get confused by the fact this example is executing a bunch of transformations. Your logic will require only one transformation. Basically:

    1. create a transformation that reads the excel document properly.
    2. set the filename for the excel document to be a variable such as ${CURR_EXCEL_FILE}
    3. Use the two jobs and one transformation in the run_all example
    4. In your "Define FILENAME Variable and execute.kjb", instead of using the filename as the name of the transformation, just give it the name of the transformation you created in step #1 above.

    p.s. The magic of this example is in the checkbox "Execute for every input row" that is enabled in the "Job entry" dialog of the
    Run all sample transformations.kjb
    Last edited by DEinspanjer; 10-09-2008 at 01:25 AM.

  5. #5
    Join Date
    May 2008
    Posts
    232

    Default

    Thanks DEinspanjer

    for giving the explanation with example. but i am still not able to understand the sample run_all and also the steps you have given to read 1000 excel sheet in one transformation through loop.

    will i have to make the variables of each and every excel file say 1000 variables for 1000 files......

    Is there any other way to proceed ....
    I am finding the looping part in kettle a bit tidious....

    Please help me

    Regards

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    > will i have to make the variables of each and every excel file say 1000 variables for 1000 files......

    Yes, but it's 1000 times the same variable :-D

  7. #7
    Join Date
    May 2008
    Posts
    232

    Default

    but each file name have different name....so it will have different varible right?? otherwise if we will have same variable for each excel file it will ececute only one file ...how will it traverse to the next file .

    Can you please explain me in details about how to use the looping.....
    I am a bit confused i think....


    Thanks a lot for your cooperation and help....
    Regards

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    but each file name have different name....so it will have different variable right?? otherwise if we will have same variable for each excel file it will ececute only one file ...how will it traverse to the next file .
    Obviously not! Who would come up with such a stupid system?
    Daniel already explained how to do it. If that's not enough, maybe you need professional help. (perhaps even from Pentaho)

  9. #9

    Default

    You mean.... pay for support!!?!
    This is a signature.... everyone gets it.

    Join the Unofficial Pentaho IRC channel on freenode.
    Server: chat.freenode.net Channel: ##pentaho

    Please try and make an effort and search the wiki and forums before posting!
    Checkout the Saiku, the future of Open Source Interactive OLAP(http://analytical-labs.com)

    http://mattlittle.files.wordpress.co...-bananaman.jpg

  10. #10
    Join Date
    Sep 2007
    Posts
    834

    Default

    meenaiyer,
    Daniel already explained the way to do the loop.
    Here you have the same, explained in other words

    You'll need a job with 3 transformations:

    transf1 --> transf2 --> transf3

    transf1:
    Obtain the lists of files to process (*.xls), and "copy rows to result"

    transf2: ("executes for every input row" checked)
    for every row, "gets rows from result", and set a variable ${XLS_NAME} which will contain the name of the excel to be processed.

    transf3:
    Executes your main transformation. when you open the xls file, you use the variable ${XLS_NAME} as the name of the file.

  11. #11
    Join Date
    May 2008
    Posts
    232

    Default

    Thanks Maria
    thanks a lot for your help...

    i want to ask you some more doubt...if you don't find it silly to solve.....

    when i run the job containing all the three transformations defined by you....i see in the log file that the variable is pointing to different excel file but the output which i get are the values from only the last excel document. I need the value from all the excel files to be appended in the output database..

    Thanks for your help...

    Regards
    Last edited by meenaiyer; 10-14-2008 at 02:55 AM.

  12. #12
    Join Date
    Sep 2007
    Posts
    834

    Default

    hi,
    try this:
    - set the log level higher. This could help in seeing what is happening,
    or
    - verify if 'truncate' is checked in the table output step. It shouldn't.
    or
    - check the ddl for the table in which you are inserting data. Maybe the last data is overwriting the first, or maybe there are some constraints that don't allow you to do the inserts you're trying to do.

    I hope it helps!

  13. #13
    Join Date
    Jan 2010
    Posts
    8

    Default Problem with loop in linux

    Hi guys

    I try made loop on windows and i don't have trhowbles but when transfer my .krt and .krb to linux just work for the first file, may you help me please thanks.

    Regards
    Leonardo Amezquita

  14. #14

    Default

    Quote Originally Posted by Maria Roldan View Post
    meenaiyer,
    Daniel already explained the way to do the loop.
    Here you have the same, explained in other words

    You'll need a job with 3 transformations:

    transf1 --> transf2 --> transf3

    transf1:
    Obtain the lists of files to process (*.xls), and "copy rows to result"

    transf2: ("executes for every input row" checked)
    for every row, "gets rows from result", and set a variable ${XLS_NAME} which will contain the name of the excel to be processed.

    transf3:
    Executes your main transformation. when you open the xls file, you use the variable ${XLS_NAME} as the name of the file.
    Sorry for reviving, but...

    This is not correct, since transf2 will set "x" times the variable ${XLS_NAME} but transf3 will only use one of them.

    I'm trying to do that without any success. transf3 does run "x" times like its job father transf2 but AFTER transf2 runs "x" times.
    I need a schema like that (I'm describing it like it is sequencial and not parallel):

    transf1: get all filenames
    transf2: get one of them, send to transf3 and wait for it to be processed by it. After it's done, send another filename to transf3
    transf3: do its job.
    Last edited by gabriel.pugliese; 07-29-2010 at 05:18 PM.

  15. #15
    Join Date
    Sep 2007
    Posts
    834

    Default

    You're right,
    maybe it worked 2 years ago
    In order to get it work, you have to embed ktr2 and ktr3 in a subjob, and check the "execute for every input row" in the "Job" job entry.

  16. #16

    Default

    Quote Originally Posted by Maria Roldan View Post
    You're right,
    maybe it worked 2 years ago
    In order to get it work, you have to embed ktr2 and ktr3 in a subjob, and check the "execute for every input row" in the "Job" job entry.
    Yeah, but there is a bug in the newer stable version that you can't send variables as parameters to a subjob. Maybe in 4.0.1 or in later versions it works.

  17. #17
    Join Date
    Feb 2009
    Posts
    321

    Default

    take a look at this link: http://jira.pentaho.com/browse/PDI-4124

  18. #18

    Default

    Quote Originally Posted by hernanthiebaut View Post
    take a look at this link: http://jira.pentaho.com/browse/PDI-4124
    Yeah
    Just tried v3.2 and It works perfectly. But there is no java filter I had a lot of them in 4.0.0!

  19. #19

    Default

    you could check kettle 4.0.1 out of svn and use that, the build is a piece of cake just ant zip
    This is a signature.... everyone gets it.

    Join the Unofficial Pentaho IRC channel on freenode.
    Server: chat.freenode.net Channel: ##pentaho

    Please try and make an effort and search the wiki and forums before posting!
    Checkout the Saiku, the future of Open Source Interactive OLAP(http://analytical-labs.com)

    http://mattlittle.files.wordpress.co...-bananaman.jpg

  20. #20
    Join Date
    May 2010
    Posts
    3

    Default Iterating thru transformation in a job

    Hey guys. I am looking for some help. I have a database table returning multiple rows and for each of the row, the result should be passed as a parameters to another transformation.Have tried all the options mentioned in this thread however it didnt help me resolve the issue.
    I am using 3.2 version of kettle.
    Any suggestions?

  21. #21
    Join Date
    Sep 2009
    Posts
    810

    Default

    Hi there,
    I recall 3.2.0 having issues with the "execute for each row" option on transformations (works well on jobs though). Get yourself a 3.2.5 build or 4.0.1 if you can switch to it. If you're stuck with 3.2.0, I'd recommend putting the transformation you need executed repeatedly into a helper job. Make the helper job "execute for each row" then.

    Cheers

    Slawo

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.