Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Kettle 3.2.0 Integrated/Embedded in an application

  1. #1

    Default Kettle 3.2.0 Integrated/Embedded in an application

    Hi,

    I had integrated kettle 3.0.2 with my application and had deployed it on an APP server. With 3.0.2 version, i used to face many threading issues.

    For.e.g: In the scenario mentioned below:
    I have 3 JBOSS servers up and running, each have the capacity to run 5 threads of transformation in parallel. So at any instinct, we can have 15 transformations running in parallel for 15 different input files or inputs.

    But, some threads used to fail due to NullPointerException in the StepLoader class.

    The trace is as given below:
    2009/07/13 07:10:14 - Pan - ERROR (version 3.0.2, build 536 from 2008/01/21 15:36:26) : at org.pentaho.di.trans.StepLoader.findStepPluginWithID(StepLoader.java:571)
    2009/07/13 07:10:14 - Pan - ERROR (version 3.0.2, build 536 from 2008/01/21 15:36:26) : at org.pentaho.di.trans.step.StepMeta.<init>(StepMeta.java:218)
    2009/07/13 07:10:14 - Pan - ERROR (version 3.0.2, build 536 from 2008/01/21 15:36:26) : at org.pentaho.di.trans.TransMeta.loadXML(TransMeta.java:2865)
    2009/07/13 07:10:14 - Pan - ERROR (version 3.0.2, build 536 from 2008/01/21 15:36:26) : at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2707)
    2009/07/13 07:10:14 - Pan - ERROR (version 3.0.2, build 536 from 2008/01/21 15:36:26) : at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2664)
    2009/07/13 07:10:14 - Pan - ERROR (version 3.0.2, build 536 from 2008/01/21 15:36:26) : at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2652)
    2009/07/13 07:10:14 - Pan - ERROR (version 3.0.2, build 536 from 2008/01/21 15:36:26) : at com.tdemand.server.datamanagement.common.DataLoadManagerServiceImpl.runTransformation(DataLoadManagerServiceImpl.java:464)


    Later, after having a look at the StepLoader class source/java file, i observed that this class is a singleton class but the instance creation methods like init()/getInstance() was not thread-safe, so i just synchronized these methods and patched-in to my 3.0.2 jars and now the transformation runs through fine when triggered as multiple threads on an APP server.


    I am now planning to upgrade the kettle jars to 3.2.0 version. I am not sure whether the integration code with the framework is corrcet/incorrect and it used to cause these threading issues.

    Can anyone please review my integration code and let me know if this code is the best possible method for embedding kettle in an application which will be deployed on APP server?


    private
    void runTransformation(String inputUrl, String outputUrl, String failedUrl, String logfile, String loglevel, String transFolder, String transjobFilename) throws KettleException{

    LogWriter log = LogWriter.getInstance(3);
    fileAppender = LogWriter.createFileAppender(logfile, true);
    log.addAppender(fileAppender);
    StepLoader.init();
    JobEntryLoader.init();
    StepLoader.init();//needed for plugin/step loading
    EnvUtil.environmentInit();


    //needed for plugin/step loading
    TransMeta transMeta =


    new TransMeta(transjobFilename);
    Trans trans = new Trans(transMeta);
    trans.initializeVariablesFrom(null);
    Calendar cal = Calendar.getInstance();
    Date start = cal.getTime();

    ArrayList args = new ArrayList();
    args.add((new StringBuilder("-file=")).append(transjobFilename).toString());
    args.add((new StringBuilder("-logfile=")).append(logfile).toString());
    args.add((new StringBuilder("-level=")).append(loglevel).toString());

    trans.setVariable("ABC", String.valueOf(1));
    trans.setVariable("DEF", String.valueOf(2));
    trans.setVariable("GHI", String.valueOf(3));
    trans.setVariable("JKL", String.valueOf(4));
    trans.setVariable("TRANS_ROOT", transFolder);

    log.logDetailed("TdPan", (new StringBuilder("Loading transformation from XML file [")).append(transjobFilename).append("]").toString(), new Object[0]);
    trans.setVariable(INPUT_FILE, inputUrl);
    trans.setVariable(OUTPUT_FILE,outputUrl);
    trans.setVariable(ERROR_FILE,failedUrl);

    trans.getTransMeta().setInternalKettleVariables(trans);
    log.logMinimal("TdPan", (new StringBuilder("args: ")).append(args).append("\nInputFile: ").append(inputUrl).append("\nOutputFile: ").append(outputUrl).append("\nErrorFile: ").append(failedUrl).toString(), new Object[0]);
    log.logMinimal("TdPan", "Start of run.", new Object[0]);
    trans.execute((String[])args.toArray(new String[args.size()]));
    logger.info("End of Transformation Execution");
    trans.waitUntilFinished();
    trans.endProcessing("end");
    log.logMinimal("Pan", "Finished!", new Object[0]);

    cal = Calendar.getInstance();
    Date stop = cal.getTime();
    SimpleDateFormat df = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss.SSS");
    String begin = df.format(start).toString();
    String end = df.format(stop).toString();
    log.logMinimal("Pan", (new StringBuilder("Start=")).append(begin).append(", Stop=").append(end).toString(), new Object[0]);

    long millis = stop.getTime() - start.getTime();
    trans.printStats((int)millis / 1000);

    if(fileAppender != null){
    fileAppender.close();
    log.removeAppender(fileAppender);
    }
    }


    This methods gets invoked for each thread of the transformation. Please let me know whether the StepLoader.init() method should be invoked once per JVM or it does not cause any issue if invoked for each transformation.

    Any help will be really appreciated.
    Thanks in advance

  2. #2
    Join Date
    Nov 1999
    Posts
    459

    Default

    Hi,

    I created a JIRA for this: http://jira.pentaho.com/browse/PDI-2633

    also see http://wiki.pentaho.com/display/EAI/...a+API+Examples
    and the already existing Kettle component within the BI Server is the same, what you are doing.

    BTW: You could also use the Carte server for your use case.

    Thanks for reporting this,
    Jens

  3. #3

    Default

    So, is this is a possible bug? I suppose the fix may be available in the next release.

    For temporary fix, i can use the same fix which i have applied while using 3.0.2 version of kettle in the 3.2.0 version and go ahead with the upgrade.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.