Hitachi Vantara Pentaho Community Forums
Results 1 to 15 of 15

Thread: How to configure multiple database connections using config files?

  1. #1
    Join Date
    Jan 2016
    Posts
    9

    Default How to configure multiple database connections using config files?

    We use kettle as an interface appliance for importing data into our central (OLAP) reporting database. Since there are various data sources for reporting, we have a couple of different database connections to import from.

    Now I want to keep the kettle files free from database credentials so that the kettle files can run on the test server, then be deployed to the production server, using a different set of database connections.

    My vision was to use an XML file holding the several connection settings (one connection per node). I read it with the "Get data from XML" step.

    Code:
    <?xml version="1.0" encoding="UTF-8" ?>
    <connections>
        <connection>
            <host>dbhost1</host>
            <database>database1</database>
            <user>fred</user>
            <password>Encrypted 12321343423142314234234234</password>
        </connection>
        <connection>
            <host>dbhost2</host>
            <database>database2</database>
            <user>wilma</user>
            <password>Encrypted 987987987987987987987987987</password>
        </connection>
    </connections>
    And then follows a "Javascript Value" step which dynamically sets connection parameters like:
    • DB1_HOST = 'dbhost1'
    • DB1_DATABASE = 'database1'
    • DB1_USER = 'fred'
    • NEXTDB_HOST = 'dbhost2'
    • NEXTDB_DATABASE = 'database2'
    • NEXTDB_USER = 'wilma'
    • etc.


    So the idea ist: for each "row" in the XML config file (each <connection> node), create the a set of kettle variables XX_HOST, XX_DATABASE, XX_USER and XX_PASSWORD. These variables can then be used in subsequent database access steps.

    I can't figure out how to make this work. As it seems to me, the "Javascript modify value" can only populate pre-defined Variables and cannot create new ones. However, I would very much like to generate these configuration settings dynamically.

    Has anybody an idea how I could accomplish this task?

    Thank you very much in advance! I am using kettle 6.0.0.
    Last edited by andimeier; 01-18-2016 at 11:44 AM.

  2. #2
    Join Date
    Aug 2011
    Posts
    360

    Default

    Javascript step has "setVariable (yourVarName, yourVarValue)" function....
    you can actually set variable of any name.
    Just need to add a <name> tag for each connection and then use it as a prefix to build
    your variable names, like prefix+"_HOST" etc.

    However, maybe the best way is to use jndi datasource. So you only need to know the jndi name in PDI connection,
    which is the same in every environnment, then you put your database configuration in a .properties file
    which is read by your jndi provider (simple-jndi in pentaho).

  3. #3
    Join Date
    Oct 2013
    Posts
    216

    Default

    You can define multiple datasource details in kettle.properties and use that variable in PDI connection details, Using it you are not required to change it while migrating from test to production.
    -- NITIN --

  4. #4
    Join Date
    Jan 2016
    Posts
    9

    Default

    Quote Originally Posted by Mathias.CH View Post
    Javascript step has "setVariable (yourVarName, yourVarValue)" function....
    you can actually set variable of any name.
    Just need to add a <name> tag for each connection and then use it as a prefix to build
    your variable names, like prefix+"_HOST" etc.
    This would be very interesting - but I did not get this to work. I tried this:

    Name:  read_xml.png
Views: 1456
Size:  3.5 KB

    with the XML source being:

    Code:
    <?xml version="1.0" encoding="UTF-8" ?><connections>
        <connection id="SALES">
            <host>sap1234.server.com</host>
            <database>SALES01</database>
            <user>sales</user>
            <password>Encrypted 503624375f6171c8eacaa90ec5789a8e024346742f1b6ffc8</password>
        </connection>
        <connection id="TARGET">
            <host>reportdb.server.com</host>
            <database>REPORT01</database>
            <user>report</user>
            <password>Encrypted 12345667868787acaa90ec5789a8e024346742f1b6ffc8</password>
        </connection>
    </connections>
    On the "Get data from XML" step, I used /connections/connection as "Loop XPath", so this step would emit 2 rows (representing 2 database connections). Plus an additional field representing the connection ID (the "id" attribute from XML's <connection> tag):

    Name:  xml_fields.png
Views: 1583
Size:  11.3 KB

    In the second step, my intention would be to dynamically set variables like this:

    Code:
    SetVariable(connection_id + '_HOST', host, 'p');
    SetVariable(connection_id + '_DATABASE, database, 'p');
    SetVariable(connection_id + '_USER, user, 'p');
    SetVariable(connection_id + '_PASSWORD, password, 'p');
    But I do not manage to check if this has succeeded.

    To me it seems like the variables have not been set. If I use them in a database operation later, the log complains:

    Code:
    2016/01/22 09:45:39 - Table input.0 - Access denied for user '${SALES_USER}'@'%' to database '${SALES_DATABASE}'
    Does this indicate that the variables have not been set?

    So, at the moment the following questions arise:

    1. How can I dump current variable values to the log or wherever?
    2. If my procedure really does not set the variables - what is wrong?


    However, maybe the best way is to use jndi datasource. So you only need to know the jndi name in PDI connection,
    which is the same in every environnment, then you put your database configuration in a .properties file
    which is read by your jndi provider (simple-jndi in pentaho).
    Sounds rational, and could really by my favorite way of achieving this. But I would like to keep the database connections for a set of jobs/transformations in the same place as the jobs/transformation and not globally. Thus, I have the opportunity to use different databases for different use cases for Kettle. Seems like JNDI datasource must be configured in the global config file (user-global or system-global). I would prefer a "use-case-specific" config.
    Last edited by andimeier; 01-22-2016 at 04:52 AM.

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You can't set variables in the VariableSpace of a transformation.
    You only can modify the VariableSpace of a job or the JVM.
    The VariableSpace of a transformation is set up only once - during initialization of a transformation.
    Usually, you have one transformation to define your variables and other transformations in that job will use them.
    So long, and thanks for all the fish.

  6. #6
    Join Date
    Jan 2016
    Posts
    9

    Default

    Update:

    it does work! Yippie Yay!

    I just had to set the variables in JavaScript with the scope 'r' (for 'root job') instead of 'p' ('parent job'). Like so (this is the content of the editor field of the "Modified Java Script Value" step):

    Code:
    var prefix = 'DB_';
    var varname;
    
    
    // set the logLevel to write the log message,
    // see http://forums.pentaho.com/archive/index.php/t-52142.html for details
    // l ... detailed
    // r ... row-level
    // m ... minimal
    // d ... debug
    // e ... error
    var logLevel = 'l'; // l=detailed
    
    
    varname = prefix + connection_id + '_HOST';
    writeToLog(logLevel, 'Set ' + varname + ' to ' + host);
    setVariable(varname, host, 'r');
    
    
    varname = prefix + connection_id + '_DATABASE';
    writeToLog(logLevel, 'Set ' + varname + ' to ' + database);
    setVariable(varname, database, 'r');
    
    
    varname = prefix + connection_id + '_PORT';
    writeToLog(logLevel, 'Set ' + varname + ' to ' + port);
    setVariable(varname, port, 'r');
    
    
    varname = prefix + connection_id + '_USER';
    writeToLog(logLevel, 'Set ' + varname + ' to ' + user);
    setVariable(varname, user, 'r');
    
    
    varname = prefix + connection_id + '_PASSWORD';
    setVariable(varname, password, 'r');
    This way, the JavaScript step automagically creates new variables which correspond to the XML configuration.

    So, in order to introduce a new connection, you just have to:

    1. add a new <connection> element in the connections.xml file
    2. use the new variables in your database-releated steps (e.g. "Table Input")


    That's all! The variables will automatically be generated from the <connection> element.

    Say, you introduce this XML block:

    Code:
        <connection id="DB2">        <host>db2.database-server.com</host>
            <database>DB2DB</database>
            <port>3306</port>
            <user>readuser</user>
            <password>Encrypted 535609898afe8408e09f8e8540982f1b6ffc8</password>
        </connection>
    Then the following variables would be provided automatically by the above "Modified Java Script Value" step:



    • DB_DB2_HOST
    • DB_DB2_DATABASE
    • DB_DB2_PORT
    • DB_DB2_USER
    • DB_DB2_PASSWORD


    ... holding the configured values from the XML. Pretty cool ...

    You just have to use these new variables in your database-related steps.

    Note: since the variables are set in Java Script and not via a "SetVariables" transaction step, no logging occurs. That means, the password (assigned to DB_DB2_PASSWORD) will not show up in the log. For all other variables, I added a writeToLog instruction to log them. For this, I used the log level "Detailed", so it would not show up with production log level settings of "Minimal" or "Error".

    Another note: I am not sure why setVariable(name, value, 'r') (set variable on root job context) works, but setVariable(name, value, 'p') (set variable on parent job context) not. I considered the job which contains the transaction (containing the "set variable" code) as being the "parent job", thus being exactly the level onto which I wanted to mount the variable. I don't know why this did not work. Seems I do not understand what exactly the "parent job" is ...

  7. #7
    Join Date
    Jan 2016
    Posts
    9

    Default

    Quote Originally Posted by marabu View Post
    You can't set variables in the VariableSpace of a transformation.
    You only can modify the VariableSpace of a job or the JVM.
    The VariableSpace of a transformation is set up only once - during initialization of a transformation.
    Usually, you have one transformation to define your variables and other transformations in that job will use them.
    Sorry, I do not understand. What is the "VariableSpace of a transformation"?

    Are you referring to my "SetVariable(name, value, 'p')"? This would make the new variable available on the parent job. What do you mean with "VariableSpace of a transformation"?

  8. #8
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Variables internally are kept in a VariableSpace - each transformation and job has one.
    You can't set and use variables from inside the same transformation, because the VariableSpace of a transformation is filled only once during initialization.
    So long, and thanks for all the fish.

  9. #9
    Join Date
    Jan 2016
    Posts
    9

    Default

    Uhm ... you are referring to the fact that I cannot use variables in "subsequent" steps in the same transformation, do I understand correctly?

    If so, yes, I have one transaction which sets the variables dynamically and another transaction which uses them. Both are wired in a job:

    Name:  job.png
Views: 1362
Size:  4.3 KB

    If this is what you meant to clarify, then thanks, I get it. I actually did it that way. After all, Spoon makes it not easy to not notice this behavior, because a giant popup dialog would splash up saying "YOU CANNOT USE VARIABLES IN THE SAME TRANSACTION IF THEY ARE SET HERE" ... something like that ...

  10. #10
    Join Date
    Aug 2011
    Posts
    360

    Default

    Quote Originally Posted by andimeier View Post
    Uhm ... you are referring to the fact that I cannot use variables in "subsequent" steps in the same transformation, do I understand correctly?

    If so, yes, I have one transaction which sets the variables dynamically and another transaction which uses them. Both are wired in a job:

    Name:  job.png
Views: 1362
Size:  4.3 KB

    If this is what you meant to clarify, then thanks, I get it. I actually did it that way. After all, Spoon makes it not easy to not notice this behavior, because a giant popup dialog would splash up saying "YOU CANNOT USE VARIABLES IN THE SAME TRANSACTION IF THEY ARE SET HERE" ... something like that ...
    Yep that's correct for understanding of variable space. So normaly setting variable on parent job, not root job si the good way to go and should work.
    If you set on root job, it will work, but you take the risk to mess up with grand, grand , grand parent jobs if you start to use this job as a sub job of another one.
    Try to keep everything the more "local" as possible, this will prevent weird undebuggable bugs (and a lot of frustration)

  11. #11
    Join Date
    Jan 2016
    Posts
    9

    Default

    Quote Originally Posted by Mathias.CH View Post
    Yep that's correct for understanding of variable space. So normaly setting variable on parent job, not root job si the good way to go and should work.
    If you set on root job, it will work, but you take the risk to mess up with grand, grand , grand parent jobs if you start to use this job as a sub job of another one.
    Try to keep everything the more "local" as possible, this will prevent weird undebuggable bugs (and a lot of frustration)
    I agree totally with you. Actually, I tried to do it like this in the first place, but setting the variable scope to 'p' (parent job) led to the variable being not known in the calling job. As I illustrated before, I used:


    1. a job JOB1
    2. the job JOB1 launches transaction A (set variables)
    3. then, the job JOB1 executes transaction B (use the variables)


    I don't understand why the variables set in A could not be found in B. That's why I changed the scope to setVariable(..., 'r') (root job). I don't like it that way but it works, the other variant didn't. Maybe 'g' (scope = "grandparent job") would also work.

    But do you (or somebody else) see the reason why scope = 'parent job' did not work in my case? After all, the job JOB1 is the parent job of transaction A, or am I misunderstanding something?

  12. #12
    Join Date
    Feb 2013
    Posts
    7

    Default

    Hello I have similar problem so I decide to describe it on this topic.

    I have created a job with parametrized connection - looking similarly like this:
    job A:
    transformationA - assign variables $CONNECTION_HOST, $CONNECTION_DB_NAME, $CONNECTION_PORT, $CONNECTION_USER, $CONNECTION_PASS
    these variables are using to create connection (PARAMETRIZED CONNECTION) to database in next step (jobB)
    jobB - read from database using connection mades from variables; it contains few transformation lets name it transBA, transBB etc.
    transBA
    transBB - this transformation contains Transformation Executor lets name it transBBA
    transBBA - on this level my variable are empty; numbers of copy to start =4
    transBC
    transBD
    transBDA

    I try to use different scope - parent and root job and none of this doesn't work in transBBA. When I use Write to log to show values of varaibles they are correct.
    But when I try use variables to create a database connection eg.Table Input with connection PARAMETRIZED CONNECTION I get error "No suitable driver found for jdbcostgresql:// "

    Only solution that I found it was move transformation transBBA level up (to level of tranformations transBA, transBB, transBC).
    I don't want to use it because it will not run in 4 copies as the same time and increases execution time.


    I using Kettle - Spoon 5.4 and Postgresql 9.3

  13. #13
    Join Date
    Aug 2011
    Posts
    360

    Default

    Quote Originally Posted by nora View Post
    Hello I have similar problem so I decide to describe it on this topic.

    I have created a job with parametrized connection - looking similarly like this:
    job A:
    transformationA - assign variables $CONNECTION_HOST, $CONNECTION_DB_NAME, $CONNECTION_PORT, $CONNECTION_USER, $CONNECTION_PASS
    these variables are using to create connection (PARAMETRIZED CONNECTION) to database in next step (jobB)
    jobB - read from database using connection mades from variables; it contains few transformation lets name it transBA, transBB etc.
    transBA
    transBB - this transformation contains Transformation Executor lets name it transBBA
    transBBA - on this level my variable are empty; numbers of copy to start =4
    transBC
    transBD
    transBDA

    I try to use different scope - parent and root job and none of this doesn't work in transBBA. When I use Write to log to show values of varaibles they are correct.
    But when I try use variables to create a database connection eg.Table Input with connection PARAMETRIZED CONNECTION I get error "No suitable driver found for jdbcostgresql:// "

    Only solution that I found it was move transformation transBBA level up (to level of tranformations transBA, transBB, transBC).
    I don't want to use it because it will not run in 4 copies as the same time and increases execution time.


    I using Kettle - Spoon 5.4 and Postgresql 9.3
    Seems to be specific to trans executor step. I think you need to specifiy explicitly the variable you want to pass down to
    your sub transformstion by configuring in the trans executor step dialogue. Use your already defined variables as the values.

  14. #14
    Join Date
    Feb 2013
    Posts
    7

    Default

    Hi Mathias thx for reply.
    I'm using those variables in as you wrote. Values are visible in transformation called in Transformation Executor as I wrote before - I checked it using Write to log step. Inside transformation transBB in step Get Variables I get those variables and then I configure Transformation Executor using Parameters option. Additionally I configured Parameters in Transformation properties dialog in both transformation.

  15. #15
    Join Date
    Feb 2013
    Posts
    7

    Default

    Quote Originally Posted by Mathias.CH View Post
    Seems to be specific to trans executor step.
    Not only transformation execution behaves like that. When I was testing Mapping step I had the same problem. Structure of job with Mapping step:
    1. jobA
    1.1 jobAB
    1.1.1 transABa - call trans (1.1.1.1) using Mapping step
    1.1.1.1 transABaa - call trans (1.1.1.1.1) using Mapping step
    1.1.1.1.1 transABaaa - in this place values of variables are empty for example using Table Input. Write to log on this level shows values of variables correct.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.