Hitachi Vantara Pentaho Community Forums
Results 1 to 23 of 23

Thread: how to run multiple transformations from one job

  1. #1
    Join Date
    Dec 2006
    Posts
    15

    Default how to run multiple transformations from one job

    I need to create a job (using Chef) that runs about 30 transformations. Currently how I have done this is that I created groups of transformations, i-e 5 groups with 6 transformations in each. Then from Chef I added all 5 transformations to a job with multiple email, error and ok objects. What I really want is that I create a job that reads all transformations from the repository and just executes them. Is this possible? Please let me know if you need more info or need further clarification of my post. thanks.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    It's not possible the "official way" ... you could of course read from a table in the repository to get at names and then execute them like that. But I would not recommend it. Are you going to change the transformations to be run so often?

    Regards,
    Sven

  3. #3
    Join Date
    Dec 2006
    Posts
    15

    Default

    The transformations are more likely to be run daily at mid night.

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    That's not what I meant... why do you want to run all transformations "automatically"

    1) Usually once you have your transformations are setup they're not going to change that much.
    2) You probably have some relationships between your transformations (one needs to be run before the other).

    I've seen ETL systems using a automatic way (not in Kettle) and it usually is worse than the explicit way.

    Regards,
    Sven

  5. #5
    Join Date
    Dec 2006
    Posts
    15

    Default

    Yes, the transformations are not going to change that often. Once they are develeloped, they are done. I am thinking about cascading 30 tra into 5 and then cascading 5 into 1. Then just add that 1 tra in a job. Will that work?

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    sure... in 2.5 if you're not using too much variables.

    Regards,
    Sven

  7. #7

    Default Problem to run different transformations in one job

    Hello,
    I'm trying to run different transformations from one job but i'm having troubles. I have one fact table, 3 dimension tables and 4 staging tables, i'm thinking to do smething like this:
    1- First i delete all the data in the fact table
    2- Load the dimensions using truncate option, Load in to 3 staging tables (This two transformations are independent and they can be run at the same time)
    3- ones i have loaded my my 3 staging tables, i load data into the fourth staging table (data extracted from the 3 staging tables) i will call it stagingtable1
    4- ones i have data in my stagingtable1 and my dimensions i load data into my fact table (data extracted from dimensions and stagigtable1)

    my question is what is the job entry to synchronize the transformations??
    in this URL (http://www.pentaho.com/images/job_screenshot.png) i saw a picture showing that it is possible but i cannot find that job entry in my Kettle 2.5.0, i have one similar job entry in the shape (java script) with interrogation sign, i don't know if it's that or not.
    Now i'm doing something but i think that it's not the good solution, i execute all the transformations sequentially although the dimensions and the 3 staging tables are independent.

    I need a help
    thanks in advance!

  8. #8
    Join Date
    May 2006
    Posts
    4,882

    Default

    Forget the picture (it's wrong). Job entries run sequentially... think of a job as a finite state automaton where only 1 job entry is active at a time.

    If you want to run parts of a job in parallel you have to split up your jobs in multiple jobs and synchronize e.g. using trigger files. And starting up jobs at the same time via some scheduler.

    Regards,
    Sven

  9. #9

    Default

    Thanks for the quick repley it was helpful!, i was really confused because the picture, now i think it's ok i don't really need to split my job to subjobs.

    Thanks alot!

    hamma

  10. #10

    Default how to Schedul a job

    From spoon when i double click on the job entry "Start" it displays a form to fill in in order to schedule the job. I choose to schedule my job daily at 15:05 and the time in my computer was 15:00, and i deleted all data in my data warehouse but after 15:05 when i did select on my data warehouse tables it seems that they are empty so the scheduling doesn't succeed.
    I don't know what's the pb, i don't know if it's a pb of time format, because in my computer the time is from 0-12, and in spoon it's from 0-23 or maybe pan is using an other time not the time of my computer maybe GMT.

    Thanks in advance!

    hammamr
    Last edited by hammamr; 08-09-2007 at 09:29 AM.

  11. #11
    Join Date
    May 2006
    Posts
    4,882

    Default

    Don't use that " "scheduler" "

    It only works when you start up the job via another way (crontab, at)... then it will wait until its time arrives... scheduling via the start entry will not start execution of the job itself, but it wil make it wait if you start it via another way. So in fact if you use an external scheduler to schedule the job, you don't really need the schedule on the start job entry.

    Regards,
    Sven

  12. #12

    Default Scheduling jobs using the command "at" on windows

    Thanks for the response, now i'm trying to schedule my job using the command "at" from the dos prompt.
    i have a .bat file in kettle directory: D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat
    and i want to schedule the execution of this file. The content of the file is the following (between the two lines):
    -------------------------------------------------------
    D:
    cd D:\pentaho\Kettle-2.5.0
    kitchen.bat /rep:Ripository_Kettle_MySQL /job:Update_mysql_caccia_Job /dir:/ /user:admin /pass:admin /level:basic
    -----------------------------------------------------

    so i use the command line:
    at 15:18 "D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat"
    to run my job each day at 15:18 but after 15:18 when i check my database i see that no data have been loaded.
    the .bat file works fine without scheduling but with scheduling i couldn't succeed.

    So maybe i'm doing something wrong or i'm missing something!!

    thanks in advance!

  13. #13

    Default Scheduling jobs using the command "at" on windows

    Thanks for the response, now i'm trying to schedule my job using the command "at" from the dos prompt.
    i have a .bat file in kettle directory: D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat
    and i want to schedule the execution of this file. The content of the file is the following (between the two lines):
    -------------------------------------------------------
    D:
    cd D:\pentaho\Kettle-2.5.0
    kitchen.bat /rep:Ripository_Kettle_MySQL /job:Update_mysql_caccia_Job /dir:/ /user:admin /pass:admin /level:basic
    -----------------------------------------------------

    so i use the command line:
    at 15:18 "D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat"
    to run my job each day at 15:18 but after 15:18 when i check my database i see that no data have been loaded.
    the .bat file works fine without scheduling but with scheduling i couldn't succeed.

    So maybe i'm doing something wrong or i'm missing something!!

    thanks in advance!

  14. #14

    Default Scheduling jobs using the command "at" on windows

    Thanks for the response, now i'm trying to schedule my job using the command "at" from the dos prompt.
    i have a .bat file in kettle directory: D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat
    and i want to schedule the execution of this file. The content of the file is the following (between the two lines):
    -------------------------------------------------------
    D:
    cd D:\pentaho\Kettle-2.5.0
    kitchen.bat /rep:Ripository_Kettle_MySQL /job:Update_mysql_caccia_Job /dir:/ /user:admin /pass:admin /level:basic
    -----------------------------------------------------

    so i use the command line:
    at 15:18 "D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat"
    to run my job each day at 15:18 but after 15:18 when i check my database i see that no data have been loaded.
    the .bat file works fine without scheduling but with scheduling i couldn't succeed.

    So maybe i'm doing something wrong or i'm missing something!!

    thanks in advance!

  15. #15

    Default Scheduling jobs using the command "at" on windows

    Thanks for the response, now i'm trying to schedule my job using the command "at" from the dos prompt.
    i have a .bat file in kettle directory: D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat
    and i want to schedule the execution of this file. The content of the file is the following (between the two lines):
    -------------------------------------------------------
    D:
    cd D:\pentaho\Kettle-2.5.0
    kitchen.bat /rep:Ripository_Kettle_MySQL /job:Update_mysql_caccia_Job /dir:/ /user:admin /pass:admin /level:basic
    -----------------------------------------------------

    so i use the command line:
    at 15:18 "D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat"
    to run my job each day at 15:18 but after 15:18 when i check my database i see that no data have been loaded.
    the .bat file works fine without scheduling but with scheduling i couldn't succeed.

    So maybe i'm doing something wrong or i'm missing something!!

    thanks in advance!

  16. #16

    Default Scheduling jobs using the command "at" on windows

    Thanks for the response, now i'm trying to schedule my job using the command "at" from the dos prompt.
    i have a .bat file in kettle directory: D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat
    and i want to schedule the execution of this file. The content of the file is the following (between the two lines):
    -------------------------------------------------------
    D:
    cd D:\pentaho\Kettle-2.5.0
    kitchen.bat /rep:Ripository_Kettle_MySQL /job:Update_mysql_caccia_Job /dir:/ /user:admin /pass:admin /level:basic
    -----------------------------------------------------

    so i use the command line:
    at 15:18 "D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat"
    to run my job each day at 15:18 but after 15:18 when i check my database i see that no data have been loaded.
    the .bat file works fine without scheduling but with scheduling i couldn't succeed.

    So maybe i'm doing something wrong or i'm missing something!!

    thanks in advance!

  17. #17

    Default Scheduling jobs using the command "at" on windows

    Thanks for the response, now i'm trying to schedule my job using the command "at" from the dos prompt.
    i have a .bat file in kettle directory: D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat
    and i want to schedule the execution of this file. The content of the file is the following (between the two lines):
    -------------------------------------------------------
    D:
    cd D:\pentaho\Kettle-2.5.0
    kitchen.bat /rep:Ripository_Kettle_MySQL /job:Update_mysql_caccia_Job /dir:/ /user:admin /pass:admin /level:basic
    -----------------------------------------------------

    so i use the command line:
    at 15:18 "D:\pentaho\Kettle-2.5.0\run_mysql_caccia_Job.bat"
    to run my job each day at 15:18 but after 15:18 when i check my database i see that no data have been loaded.
    the .bat file works fine without scheduling but with scheduling i couldn't succeed.

    So maybe i'm doing something wrong or i'm missing something!!

    thanks in advance!

  18. #18

    Default sorry!

    I had internet conncetion problem that's why the same message has been repeated many times.

    Sorry.

  19. #19
    Join Date
    May 2006
    Posts
    4,882

    Default

    redirect your output from kitchen (the > hint given before) and check what's the result of that.

    Regards,
    Sven

  20. #20

    Default

    The same i redirected the output it only shows that a new prcess has been added "Aggiunto un nuovo processo con ID = 1" but it doesn't show any thing about the ETL.

    Thanks in advance

    Hamma

  21. #21
    Join Date
    May 2006
    Posts
    4,882

    Default

    lol ... redirect the output of kitchen.bat itself. Maybe you'll see some variable is not defined, or it doesn't find your JDK, ...

    Regards,
    Sven

  22. #22

    Default PDI (Kettle), Talend, SAS, BO

    Thanks for help.
    now i'm asked to evaluate PDI, Talend and do a benchmarking against proprietary ETL (SAS & BO). i designed and implemented my data warehouse then i used PDI to load data from the source database on oracle into the target data warehouse on mysql(localhost) and everything was fine. i did the same with talend and it was ok. But the company i'm working for are asking if the following features exist or no:
    - The use of a standard to export the metadata like the one used by SAS (CWM)
    - The use of LDAP for authentication
    - The use of external schedulers
    - impact analysis (i saw from the menu transformation in spoon something about impact analysis but i don't know how it works and how is it compared to the proprietary tools)
    they ask also for an overview of PDI architecture.

    if any one can help me
    Thanks in advance!
    hamma

  23. #23
    Join Date
    May 2006
    Posts
    4,882

    Default

    A thread hi-jacker ... make a new thread

    Regards,
    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.