Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Help with the specifics of VARIABLES please

  1. #1
    Join Date
    Feb 2015
    Posts
    19

    Default Help with the specifics of VARIABLES please

    Hi,
    I am really getting confused about how to use variables.
    I have consulted many posts and am still no wiser as how variables are declared, set and typed.
    Many posts show 'Set Variables' steps as 'set Environment Variables' and these have columns of Field nams and Variable Name.
    I am using Kettle-Spoon 5.3.0.0 and my 'Set Variables' steps have 'Variable name' and 'Value'
    then there is the wiki (http://wiki.pentaho.com/display/EAI/Set+Variables) which shows even more attributes like 'Apply formatting' !


    Rules I have gleaned so far are
    * you cant set a variable in the same transformation as it used
    * its not so much a declarative as a setting "It accepts one (and only one) row of data to set the value of a variable."


    What I am trying to do.....
    there is a database table keyed on date with a 'needed flag' and a 'done flag'.
    ....for each Date where Needed=Y and Done=N
    ...........do the work - in this case take a snapshot fact as-at this date
    ............mark the date as 'Done'=Y
    .....end for


    Until I get the date passing working the 'do the work' step is not in the flow
    It starts with a table input step and what I expect is to able to use a variable
    in its where clause. something like where LS.AS_OF_DATE = '${FACT_LVE_as_at_date_txt}'


    So far my Pentaho structure is a job
    * Start
    * Run transform 'select date' (with no advanced option set)
    * Set Variables
    * Run transform 'mark off date'


    transform 'select date'...
    * has a table input step that selects the dates
    * has a 'copy rows to result' step

    Set Variables
    * has no content - odd in the extreme but many posts say this.

    Transform 'mark off date'
    * Get variables - attempting to get the variable
    * a 'get system Info' and formula step (to add constants)
    * an Update step


    Running this gets various errors which leads me to the questions
    1. Where is a variable given a name ?
    2. Where is a variable given a data type / format ?
    3. How do you get a variable into the list ie 'press CTRL Space to select a variable to Insert' ?
    4. The whole looping appears to be controlled by 'Copy Rows to Result' in lower level transform
    from a documentation / maintenance viewpoint this does not seem sound


    Ok so now feel free to tell me where I have gone off the rails - and or a much simpler way to achieve the whole thing.


    Thanks
    JC

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by jc508 View Post
    Many posts show 'Set Variables' steps as 'set Environment Variables' and these have columns of Field nams and Variable Name.
    I am using Kettle-Spoon 5.3.0.0 and my 'Set Variables' steps have 'Variable name' and 'Value'
    then there is the wiki (http://wiki.pentaho.com/display/EAI/Set+Variables) which shows even more attributes like 'Apply formatting' !
    There's a Step and a Job Entry - both named Set-Variables.
    The step will take its values from the fields in a row, the job entry will take constants.


    Quote Originally Posted by jc508 View Post
    Rules I have gleaned so far are
    * you cant set a variable in the same transformation as it used
    * its not so much a declarative as a setting "It accepts one (and only one) row of data to set the value of a variable."
    Each job has its own set of variables. A nested job or transformation inherits all the variables from the caller.
    Only a job's variables are modified, a transformation's variables are inherited once during initialization of that transformation.
    Of course you could have Set-Variables in a transformation with more than one row coming in, but obviously only the values from the last row processed would be effective.


    Quote Originally Posted by jc508 View Post
    Until I get the date passing working the 'do the work' step is not in the flow
    It starts with a table input step and what I expect is to able to use a variable
    in its where clause. something like where LS.AS_OF_DATE = '${FACT_LVE_as_at_date_txt}'
    You will have to reason about your design decision (variable vs. placeholder) sometime later...


    Quote Originally Posted by jc508 View Post
    1. Where is a variable given a name ?
    Since you are using a Set-Variables Job Entry, the variables are named there.

    Quote Originally Posted by jc508 View Post
    2. Where is a variable given a data type / format ?
    Variables are used for textual substitution. You can parse a variable to retrieve a data-type other than String, though.
    Typically, this is done in a transformation using a Get-Variables step, but other steps (Calculator, Scripting, ...) are able to do that as well.

    Quote Originally Posted by jc508 View Post
    3. How do you get a variable into the list ie 'press CTRL Space to select a variable to Insert' ?
    The Spoon Edit Menu provides some action items to control the set of variables available.
    Parameters will be listed , too.

    Quote Originally Posted by jc508 View Post
    4. The whole looping appears to be controlled by 'Copy Rows to Result' in lower level transform
    from a documentation / maintenance viewpoint this does not seem sound
    With the streaming concept at the heart of a transformation, implementing a counted loop by iterating over a tailored set of rows feels quite naturally to me.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Feb 2015
    Posts
    19

    Default

    Marabu,
    Thank you immensely for your feedback it greatly helps my understanding. Alas there are subsequent questions...

    Quote Originally Posted by marabu View Post
    There's a Step and a Job Entry - both named Set-Variables.
    The step will take its values from the fields in a row, the job entry will take constants..
    well knock me down with a feather - 2 objects of the same name, same icon and totally different behaviour - no wonder I get confused and no wonder the documentation doesn't make sense.

    Quote Originally Posted by marabu View Post
    Each job has its own set of variables. A nested job or transformation inherits all the variables from the caller.
    Only a job's variables are modified, a transformation's variables are inherited once during initialization of that transformation.
    Of course you could have Set-Variables in a transformation with more than one row coming in, but obviously only the values from the last row processed would be effective.
    So a job's variables are available in the transform. How? just by using the name from the job eg ${job-var1} or do they have to be retrieve with Get Variable ?

    If a variable cant be set and used in the same transformation then what is the use of the 'Set Variable (transformation)'? Can a Transformation have children of its own?

    Quote Originally Posted by marabu View Post
    You will have to reason about your design decision (variable vs. placeholder) sometime later...
    by placeholder do you mean parameters?
    the select already has things like
    , '${Internal.Transformation.Name}' AS my_transform_name
    so the choice of ${} things seemed sensible. Please advise if I have misinterpreted their use

    Name:  Job Load Fact.jpg
Views: 1224
Size:  26.8 KB
    Finally this job entry is silly because the flow of rows from the Select date transform does not influence the 'Set Variables (Job flavour)' in any way.
    so how do I get the dates to pass them along (the 'select_date' transform just copies its rows to output)
    Name:  select_date.jpg
Views: 1181
Size:  31.1 KB

    Thanks again
    JC

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by jc508 View Post
    So a job's variables are available in the transform. How? just by using the name from the job eg ${job-var1} or do they have to be retrieve with Get Variable?
    They are copied during initialization of a transformation.
    You can use them for textual substitution immediately everywhere that feature is available - look out for the Dollar-Icon in Spoon.
    You can use Get-Variables to add certain variable values as fields to the stream, if you need that.

    Quote Originally Posted by jc508 View Post
    If a variable cant be set and used in the same transformation then what is the use of the 'Set Variable (transformation)'? Can a Transformation have children of its own?
    The Set-Variable step is there to create variables from fields, not from constants.
    Comes in quite handy sometimes.

    Quote Originally Posted by jc508 View Post
    by placeholder do you mean parameters?
    I mean the question mark (?) you can put in your SQL statement to read values from fields.
    I don't know enough about your project to criticize your decision, just wanted to make sure you are aware of the options.

    Quote Originally Posted by jc508 View Post
    Finally this job entry is silly because the flow of rows from the Select date transform does not influence the 'Set Variables (Job flavour)' in any way.
    so how do I get the dates to pass them along (the 'select_date' transform just copies its rows to output)
    One way would be to use Copy-Rows-To-Result in your first transformation and let the second transformation do a field to parameter conversion in connection with the looping feature - see the advanced options in the Transformation settings.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Feb 2015
    Posts
    19

    Default

    Thank you Marabu.

    I have got it to work.
    I think the summary is - there is no way to catch the data being returning from the 1st step and load it, row-by-row, into a VARIABLE that could be used by later steps.
    Basically because the Set Variables at the Job level only handles constants.

    The alternative is to use parameters into each child.
    For anybody reading this the lineage of the data is.
    * The column name(s) picked in the select statement in the 'pick dates' step becomes the stream column name in the parameters of the next step
    * the formal parameter name of the next step can be different
    * once a Job or Transform has Parameters declared then they seem to be available in lots of places with the ctrl-space combo
    .....* the parameter can be referred to directly eg ${p_FACT_LVE_as_at_date_txt} for, say, use in a select statement
    .....* the parameter can be retrieved Get Variables step in a transform - this makes the variable a field in the stream which is what you would need for an Update statement

    So a parameter at one layer becomes a variable at the next lower level.
    And to answer my own question - the way to get items into the Variables list (CTRL-Space) is to declare than as parameters to that transform.

    I will try and post a mini-tutorial on this so others dont have to follow the same trial and error trail.

    Thanks
    JC

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.