Hitachi Vantara Pentaho Community Forums
Results 1 to 22 of 22

Thread: Overriding parameters in transformations from jobs

  1. #1
    Join Date
    Feb 2009
    Posts
    16

    Default Overriding parameters in transformations from jobs

    I have a job with a single transformation T1. I had expected that if I set a parameter within T1 with a value, that I could still set a value to the same named parameter from the job and thereby override T1's parameter default value. But it appears that T1 keeps using its default value set at the transformation.

    Use case is simply that I'd like to be able to set a filename in a parameter when I'm working just with the transformation (e.g., parameter 'fname' set to 'c:\testfile.txt'), however, when I use the transformation within a job, I'd like to be able to set 'fname' to 'c:\prodfile.txt' and have the transformation's value overridden.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Should be possible... how about attaching your examples.

    Regards,
    Sven

  3. #3
    Join Date
    Feb 2009
    Posts
    16

    Default sample job and transformation

    Here's an example of what I was talking about. I've tried to override the transformation set value for param 'v1', but it always uses the transformation value.
    Attached Files Attached Files

  4. #4
    Join Date
    Feb 2009
    Posts
    16

    Default

    I'm really surprised this doesn't work. Is there a different method to write transformations and then drive parameters (e.g., environment variables) at runtime via job settings?

  5. #5
    Join Date
    May 2006
    Posts
    4,882

    Default

    Raise a JIRA. The way it was originally intended to be used attached (with job overriding transformation parameter).

    Regards,
    Sven
    Attached Files Attached Files

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Seems to work just fine...

    Code:
    matt@Hannelore:~/svn/kettle/branches/3.2.0/distrib$ sh kitchen.sh /file:/tmp/Job_Param_Replace_Example/Job_1_001.kjb /param:v1=foo
    INFO  01-07 11:31:28,512 - Kitchen - Start of run.
    INFO  01-07 11:31:30,581 - Using "/tmp/vfs_cache" as temporary files store.
    INFO  01-07 11:31:30,980 - Job 1 - Starting entry [Transformation 1]
    INFO  01-07 11:31:30,981 - Transformation 1 - Loading transformation from XML file [file:///tmp/Job_Param_Replace_Example/Transformation_1_002.ktr]
    INFO  01-07 11:31:31,219 - Transformation 1 - Dispatching started for transformation [Transformation 1]
    INFO  01-07 11:31:31,225 - Transformation 1 - This transformation can be replayed with replay date: 2009/07/01 11:31:31
    INFO  01-07 11:31:31,228 - Get Variables.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
    INFO  01-07 11:31:31,230 - Write to log.0 -
    ------------> Linenr 1------------------------------
    z = foo
    
    ====================
    INFO  01-07 11:31:31,233 - Write to log.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
    INFO  01-07 11:31:31,326 - Job 1 - Starting entry [Success 1]
    INFO  01-07 11:31:31,326 - Job 1 - Finished job entry [Success 1] (result=[true])
    INFO  01-07 11:31:31,327 - Job 1 - Finished job entry [Transformation 1] (result=[true])
    INFO  01-07 11:31:31,327 - Kitchen - Finished!
    INFO  01-07 11:31:31,327 - Kitchen - Start=2009/07/01 11:31:30.804, Stop=2009/07/01 11:31:31.327
    INFO  01-07 11:31:31,327 - Kitchen - Processing ended after 0 seconds.

  7. #7
    Join Date
    May 2006
    Posts
    4,882

    Default

    For the original poster... exactly which version are you using... if you would be using a 3.2 RC version, upgrade to the 3.2 GA/stable version.

    Regards,
    Sven

  8. #8
    Join Date
    Feb 2009
    Posts
    16

    Default

    I am running the 3.2.0 GA release on Mac using java 1.6.0_13 release.

    I just played around with this again and ran from command line, as well. When running the job, I'm still seeing the v1 value set in transformation and not the job-set one. I'll look for the example that Matt refers to below and check if I'm doing something differently.

    Thanks for the reponses!!
    -john

  9. #9
    Join Date
    Feb 2009
    Posts
    16

    Default setting parameter in transformation blocks substitutions

    I finally got some time to try troubleshooting this and these are my observations:

    Assuming you have a transformation that is referencing a variable via ${var} syntax:

    1) if value is set via Transformation->Settings->Parameters , then value is essentially static and won't be over-ridable from job using that transformation UNLESS you specifically set via parameter tab on the transformation step within the job. This means if you have an environment variable you'd like to set once in a job and have it carry through to all child-transformations, using the 'parameter' functionality is probably not recommended (since you'd have to repeat the value setting for each transformation step in the job).

    2) if value is set via a 'set variables' step in the job, then child-transformations can use the values of the variable; but this means you can't independently run transformation outside the context of the job

    I'd just like to ask again, given the above: Am I missing something here? As an example, how would you set up a single job, single transformation where some environment variable (e.g., an extract file directory) can be set and used independently within the transformation, but overriden from the job?

    thanks,
    -john

  10. #10
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well, why would you actively set variables that are defined as parameters?
    What is the use-case for that? Why would you even want to try it?

  11. #11
    Join Date
    Apr 2008
    Posts
    146

    Default Using paramters in Jobs and Transformations

    I am actively trying to understand how to pass parameters to a job from the command line and have those passed parameters take effect in the sub-transformations and sub job(s).

    Our development steps looked like this.

    Step 1: (Theme: Do one, while eying the many.)
    Our use case is such that we started with a transformation that did some useful processing for a building we manage. We have 32 similar buildings.

    Step 2: (Theme: Freedom from spoon. Do the whole thing in a shell script.)
    We encapsulated our transformation in a job (Do processing for a single facility), which was later encapsulated in a job called "Do processing for all facilities).

    The structure here has to do with when variables are actually usable. It's that trick of initialize variables in one transform and then use them somewhere downstream.

    This worked in Spoon, but our current design sets "DatabaseName" near the head of the snake. Near the tail of the snake we set a "BatchID" and use a date parameter to dictate how many records to process. Ie. Do I do this for only a certain short timeframe or until today's date?

    --- The point ---

    I want to execute the job with some parameters that override whatever is in the transforms and jobs below the main calling job.

    How might I accomplish this?

  12. #12
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Smoodo,

    You didn't mention what version you are using.
    Depending on the version, there are different ways of dealing with it.

    Assuming that you are using PDI 3.2 (or newer), you should look at http://wiki.pentaho.com/display/EAI/Named+Parameters

  13. #13
    Join Date
    Feb 2009
    Posts
    16

    Default

    Version 3.2 GA here:

    I started this post several months ago, but just to refresh my memory I looked at the link you provided, ran a quick test, and reconfirmed this doesn't work...at least not here. I don't know how to explain it anymore simply than this:

    You have a transform with a named parameter, let's call it DATA_DIR, for which you set a default value of '/home/fred/tmp_1. The transform uses this parameter to determine which directory it's going to read a particular file from.... let's just say it looks for a file named: /${DATA_DIR}/data.csv

    Now you can run this and if you have a file located at /home/fred/tmp_1/data.csv, all is well and the transform runs as you'd expect.

    Now you create a job which uses the above transformation, but you want to override ${DATA_DIR} here so that you can read the csv file from a different location. You create a job and then in the parameter settings define parameter DATA_DIR as: '/home/fred/tmp_2'.

    Now you run the job and instead of reading the file at /home/fred/tmp_2/data.csv, it still reads the file at /home/fred/tmp_1/data.csv.

    Just for grins, I went into the transformation and deleted the parameter there. Now if I re-run the job, it now reads the file at /home/fred/tmp_2.

    Now the use case here is that I'd like to:

    1) test run a transformation on its own using a default parameter value
    2) be able to embed the transformation in a job and override its default parameter (which is kind of why I thought they were called "default values")

    This doesn't seem to be working properly; or I'm just off-base here and am trying to get some functionality that is implemented completely differently in this tool.

    -john

  14. #14
    Join Date
    Apr 2008
    Posts
    146

    Default Parameters continued:

    Using PDI version 3.2 GA

    The example files in Job Param Replace Example.zip above work for me when executing them as local files outside of the repository. When they are in the repository, the parameters are not passed at all.

    Why would the behavior differ between executing files locally vs in a repository?

    The command line that I used when executing from the terminal is:

    ./kitchen.sh -file=/home/dtruty/Documents/Spoon\ Jobs/Job_1_001.kjb -level=Debug -param:v1=MYDBNAME

    That worked. There is output in the log indicating the variable passed.

    When trying to execute the very same files in a repository, the variable just takes on the default value stored in the transformation. If I delete the parameter from the transformation and try to run the job with a parameter, then all that shows up is a z=${v1}.

    The command line to execute from the repository was this:

    ./kitchen.sh -log=/tmp/log.txt -level=Debug -rep="SG Repository" -user=admin -pass=admin -dir="/Pentaho Samples" -job="Job_1_001" -param:v1=Foo



    Is this a bug? What am I missing?
    Last edited by Smoodo; 10-08-2009 at 05:17 PM.

  15. #15
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Grrr. Sorry - feel like I sent you down the wrong path.


    I believe that I saw a bug at jira ( http://jira.pentaho.com ) about it, but can't find it now.
    Last edited by gutlez; 10-08-2009 at 05:50 PM.

  16. #16

    Default same issue

    Hi,

    I ran into the same issue and it took me a while to figure out what was going on.

    I have a transformation that reads the db information through variables ${db_user_name}, etc. The variables are set by reading a configuration file and using setVariable(...) in a separate transformation.

    The job uses the following transformations:

    1) Set variables from conf file
    2) Execute transformation

    I configured the variables as named parameters in my transformation for testing purposes and didn't bother to delete them as I figured they'd be overridden in the first step of my job.To my surprise this didn't happen.

    Matt's comment seems to indicate that this is by design. This doesn't seem particularly intuitive and a source for errors that is hard to locate.

    However, I just found the following work-around that does the trick for me: at the job level, I explicitly set the parameter-value pairs in the transformation dialog at the bottom, e.g. parameter "data.directory", value "${data.directory}". This will set the parameter to the value that was read into the ${data.directory} variable from the first step.

    I am on PDI 3.2 GA.
    Last edited by Daniel Weimer; 11-16-2009 at 05:22 PM.

  17. #17

    Default

    I came across this thread having had the same issue. This is my use case:

    - I have a number of transformations, all of which are parameterized so that they can accept dynamic input.
    - Within the transformations, I set default values for the transformation parameters so that I can run the transformations independently for testing, and also document the parameters expected by each transformation.
    - I have a job that makes use of these parameterized transformations. Within the job, I set all the parameters for all the child transformations in the job to default values. This allows me to test the entire job with all the parameterized transformations within.
    - Finally, when I run the job from the command line, I want to be able to specify parameter values to override the defaults in the job. The values passed in from the command line would then override the defaults set in the job, and override the defaults set in the child transformations as well.

    Just like the other posters on this thread, I've found that any default parameter values set in the transformation override the parameters set in the parent job. This means that in order to parameterize a transformation intended for use in a parent job, I have to remove the parameters from the transformation and can only set their values (default or otherwise) in a parent job.

    This situation makes testing parameterized transformations inherently more difficult than it needs to be, since I would have to keep entering parameters on every test cycle for the transformations. It also means that the metadata about which parameters a transformation expects cannot be stored in the Parameters tab inside the transformation, which is a shame.

    Can somebody confirm if this is expected behavior, and if so, what the rationale is for this behavior? Also, if there is an alternate method for parameterizing transformations and overriding those parameters from a parent job (while supporting easy testing of the transformations by themselves), can somebody describe the process for me?

    Thanks,

    - max

  18. #18
    Join Date
    Feb 2009
    Posts
    16

    Default issue is running using repository...

    Sorry, just getting back to these forums after a long while.

    I think it's been said somewhere before, but overriding transformation-level parameters from job-level parameters does work when you're not running Spoon a repository (i.e., just file-based). I'm not sure why, but I've verified that (on 3.2 GA, windows, mac).

  19. #19

    Default

    Quote Originally Posted by jkereszturi View Post
    Sorry, just getting back to these forums after a long while.

    I think it's been said somewhere before, but overriding transformation-level parameters from job-level parameters does work when you're not running Spoon a repository (i.e., just file-based). I'm not sure why, but I've verified that (on 3.2 GA, windows, mac).
    I'm using Spoon 3.2.0-GA without a repository on Mac OS X, and the transformation parameter defaults are not being overridden by the specified parameters from the job from what I am seeing. The only solutions I have at this point are to (a) not set defaults for parameters in the transformation, and to set up a test harness job with defaults set as a convenience, or (b) always run the transformations from the command line with the parameters specified during testing. Obviously, neither solution is ideal.

    Have you tested that *defaults for* parameters are overridden, or just that *parameters* are overridden? So long as I don't set defaults, everything works as expected.

    What's the expected behavior for the application? Is this a bug or an intended feature?

  20. #20
    Join Date
    Feb 2009
    Posts
    16

    Default

    I just found my old test job and re-ran on the mac (using 3.2 GA). When running without repository, it's overriding the parameters as I'd remembered. Let me know if you want an example and I'll post a zip.

  21. #21

    Default

    Please do, that would be great. I'll take a look and compare it with what I'm doing in my job.

  22. #22
    Join Date
    Feb 2009
    Posts
    16

    Default

    Here's a zip file containing just a simple example.

    Transformation T1 reads from ${WRK_DIR}/Book1.xls and writes to a text file 'out' at ${WRK_DIR}. In T1, WRK_DIR param is set (on my machine) to /Users/jkereszturi/tmp/KettleTest/t1. As expected, running this transformation, will use that directory setting.

    Job J1 just calls T1, but in job settings the WRK_DIR parameter is set to /Users/jkereszturi/tmp/KettleTest/t2. If you run this without using a repository, it's reading and writing from the .../t2 directory. If you run it from a repository, it ignores the job's param setting and continues to use the ../t1 directory.

    Just a couple notes: Be sure to change directory paths in the WRK_DIR for your setup. Also, you'll need to change the 'Transformation Filename' in J1's job settings to point to where ever you land the KettleTest zip contents. Also, if you import into a repository, be sure to empty out that same field otherwise it won't use J1 from the repository but rather it will still point to that filename.

    -john
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.