Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Various questionns

  1. #1

    Default Various questionns

    Hi - I have a number of questions on Kettle.

    1. What are the pros / cons of using the kettle repository over a file system for the job / transformations. We currently have them as files that are checked in / out of subversion.
    2. When running a job in Spoon, we are provided with the execution history. Is there a way to get Kettle log each job / transformation run and its status to the database?
    3. We use a JNDI Connection in our transformations. I have noticed that one needs to initialize it prior to running the script or things fail. How does one ensure that the connection is valid when running in batch mode?
    Thanks

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    What are the pros / cons of using the kettle repository over a file system for the job / transformations. We currently have them as files that are checked in / out of subversion.
    Pro's would be a more shared environment (when not using a shared drive e.g.), in the you could reuse connections in a repository (not via files), but that difference has more or less disappeared.

    Contra's: sometimes corrupts. Can't check it in easily in SVN, ...

    Use files, I do especially in production

    When running a job in Spoon, we are provided with the execution history. Is there a way to get Kettle log each job / transformation run and its status to the database?
    In the options of a job/transformation you can put the log table in there... one for jobs, one for transformations. Have a look at the auditing tip http://kettle.pentaho.org/tips/?tip=9

    We use a JNDI Connection in our transformations. I have noticed that one needs to initialize it prior to running the script or things fail. How does one ensure that the connection is valid when running in batch mode?
    It's a known problem... log on to JIRA and add your comments/vote for http://jira.pentaho.org/browse/PDI-46

    Regards,
    Sven

  3. #3

    Default

    Thanks - what were the problems that you observed when using SVN ?

    I saw the log tab, however could not identify how to relate the job, sub jobs and transformations. Is the BATCH_ID the same for the entire thread - i.e. if a job has sub-jobs that have transformations will be BATCH_ID be the same across them ? I did not notice a parent ID so I am assuming that I would have to use time to correlate the execution path. is that correct?

    I believe that the JNDI problem does not occur when using the Repository, which may be a reason for using the repository.

    Could we use a mixture of repository and files? i.e. have the database connection defined in the repository while the job and transformation files are on the file system?

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    Thanks - what were the problems that you observed when using SVN ?
    No problems, but if you want to use SVN you need files.

    I saw the log tab, however could not identify how to relate the job, sub jobs and transformations. Is the BATCH_ID the same for the entire thread - i.e. if a job has sub-jobs that have transformations will be BATCH_ID be the same across them ? I did not notice a parent ID so I am assuming that I would have to use time to correlate the execution path. is that correct?
    batch id and transformation id are incremental (select max()+1 from logtable...). 2 issues:
    1) not 100% guaranteed to have unique id's when you run jobs/transformations in parallel
    2) if you run a transformation on its own the batch id will be the transformation id (in 2.5, it's fixed in 3.0)

    I believe that the JNDI problem does not occur when using the Repository, which may be a reason for using the repository.
    Maybe when both metadata and "real data" are in the same database.

    Could we use a mixture of repository and files? i.e. have the database connection defined in the repository while the job and transformation files are on the file system?
    There are better ways now I think... use variables for database name, userid, password.
    KISS ... Keep It Simple and Stupid

    Regards,
    Sven

  5. #5

    Default

    Thanks -

    Where is the code that does the logging ? Can I provide my own implementation? How does this differ from 3.0? and finally - where is the code that shows the job / subjob / transformation in Spoon if we run a job in Spoon.

    I would like to be able to build a similar report, showing where the error occurred. I am assuming that the count updates are only persisted to the database on completion - so the overhead will be minimal.

    Thanks.

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    Where is the code that does the logging ? Can I provide my own implementation? How does this differ from 3.0? and finally - where is the code that shows the job / subjob / transformation in Spoon if we run a job in Spoon.
    A little bit sprinkled throughout the code, you can't provide your own implementation. Is prety similar from 2.5 to 3.0

    I would like to be able to build a similar report, showing where the error occurred. I am assuming that the count updates are only persisted to the database on completion - so the overhead will be minimal.
    What you will see in the log tables is less than what you see in spoon... on JIRA there's still http://jira.pentaho.org/browse/PDI-42 open e.g. You could vote for it e.g.

    Regards,
    Sven

  7. #7

    Default

    Thanks - is this functionality provided through the Tracker class ? Are there any specific requirements for 3.0 that I should consider if I decide to take this on ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.