Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Feature enhancements expected

  1. #1

    Default Feature enhancements expected

    Attachment: chef_sample.jpg Hi Matt


    I've been trying to evaluate kettle to see if it can solve some of my complex data loading/transformation requirements and got stuck up with some show stoppers! Following are some enhancements that I feel is going add great value.



    1. In the input steps (text,excel,xbase..), I noticed that rownumber does not get reset when there are multiple files being read. Is it possible to have an option to reset the rownumber to 1 when the source file changes.



    2. XBASE input step. Is it possible to have wildcard input files (*.dbf) from specific directories like you have for the other input steps.



    3. Table Output Step: Is it possible to have an option to COMMIT only if a complete source file is successfully processed. ( assume i am reading *.csv in the input step, then commit with every change in source file). This could be implemented as - If the value of input column "_" changes then COMMIT.



    4. IN SPOON: There is definitly a requirement to read multiple files (*.csv) from a source directory and move the processed file to a processed or error directory at the end of the transformation. This is better than having to do complex workarounds in CHEF on how to create a shell script that knows which files were successfully processed and which files had errors.



    5. In CHEF, Is it possible to have a directory reader step. Lets say get *.csv filenames. For each file found by the step provide the filename to a transformation as an parameter. Once the transformation is completed, I can then have a shell script to move the file to a processed_files or error_files directory.



    6. In CHEF, Is it possible for the SMTP step to pick the email addresses (to,cc,bcc) from a database query. If I've got a 100 jobs setup and because the email addresses keep changing, managing the jobs is a nightmare.



    Regards
    Biju

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: Feature enhancements expected

    Hi Biju,

    For your tracking pleasure I added the following trackers:

    Change Request - [# 1697] TextFileInput: reset rownumber when a new file is processed.
    Change Request - [# 1698] XBASE Input : add support for wildcards for filenames
    Change Request - [# 1699] Chef: Create new "For each" job entry
    Change Request - [# 1700] SMTP job entry: allow parameterising

    As for the other questions:

    3) The database handling will change dramaticlly towords the end of the year.
    In the mean-time just put the commit size to 99999999 (turn of batch processing)

    4) Move processed files to another directory:
    - add the filename to the output rows.
    - split the stream to a select values/Unique rows combination to get the filenames
    - send the filenames to a "Copy rows to result" step.
    - In chef, send the filenames to a script that moves the files ONLY if all went well.

    Hope this helps,
    Matt

  3. #3

    Default RE: Feature enhancements expected

    Hi Matt,

    Thanks for all the trackers. Will keep hoping that you get time soon to workon them.

    I havent really used "Copy rows to result" as it is not clear in the documentation neither are there any examples.

    Could you explain how i can get the row (lets say filename) from a transformation and use it in a shell script. as mentioned in your suggestion.

    Thanks
    Biju.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: Feature enhancements expected

    Attachment: chef_sample.jpg Hey Biju,

    Why make it so difficult?
    Just process files *.csv in a directory and if this goes well in the transformation, move *.csv to another directory.
    The solution I mentioned (shown in attachement) is only needed in case you're working in an async environment: when files are continuously being put into the directory.

    Take care,
    matt

  5. #5
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: Feature enhancements expected

    Hi Matt,

    Thanks mate, your example has made life easier.

    However a new doubt has cropped up. Yes files are added to the directory continously. Lets say at the time of running the Transformation there are 20 files in the input directory. Are you saying the script.bat will be called with all the filenames as on parameter list or will it be called with each parameter seperately.

    script.bat 1.csv 2.csv 3.csv .............. 20.csv

    or is it

    script.bat 1.csv
    script.bar 2.csv
    ....
    script.bat 20.csv

    The reason i ask is that I remember from my golden days of DOS batch programming that there is a limit to number of parameters that can be received by a shell script. In any case your solution has opened up new logics to try out...

    Thanks
    Biju.

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: Feature enhancements expected

    Hi Biju,

    You need to use shift, loop and always refer to %1 or $1.
    Something like say...

    --------------------------------------------------------------------
    @echo off

    echo Files to be moved: > C:\Temp\script.log

    :LOOP

    IF "%1"=="" GOTO DONE
    echo %1 >> C:\Temp\script.log
    SHIFT
    GOTO LOOP

    ONE
    --------------------------------------------------------------------

    Found this on the web, however, i'm not sure what versions of DOS it will work on ;-(
    YMMV, no guarantees, I'm not a DOS wizard...

    The ForEach job entry (under investigation/construction) will make life easier for the non-script kiddies among us.

    As far as timing is converned, things are very busy around here, but I try to do a couple of feature requests here and there. However, you should not worry, things are bound to improve dramatically pretty soon :-)

    Cheers,
    Matt

  7. #7

    Default RE: Feature enhancements expected

    Fantastic. Thanks again mate.....

    Cheers
    Biju.

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: Feature enhancements expected

    Hey Biju & All,

    PLEASE be careful when dealing with async processes.
    All kinds of race conditions can pop up!
    I suggest copying (FTP, SFTP, copy, whatever) the text-files into the directory with extention .temp or something and then doing a rename right after to .csv. The rename should be atomic on most OS-es.
    That way, you don't risk Kettle processing partial files.

    If you know about these things, then fine, otherwise please say: "Yes Matt, I understand".

    ;-)

    Matt

  9. #9
    Join Date
    Nov 1999
    Posts
    459

    Default RE: Feature enhancements expected

    If you want to process all files in a directory, create a batch file (.bat or .cmd) with the content:
    FOR %%I IN (c:\*.csv) do call c:\kettle\pan.bat [your parameters] %%I
    [Call is needed because pan is a batch file, too.]

    So you donÂÂ't have a problem with number of parameter limitations (I think 10 parms is the max).

    As Matt mentioned above the rename process is the easiest solution in DOS async processing.

    HTH,
    Jens

  10. #10

    Default RE: Feature enhancements expected

    Thanks guys for the warnings.

    I am currently handling the "partial file problem" as follows

    For each file in the source folder :
    1. Rename the file in the source directory as ???.dat (if the rename worked then it is a whole file as the OS has no locks on the file)

    2. All renamed files are then MOVED into a "work directory" and then used by the transformation process.

    3. Processed files need to be then moved into the "Processed directory" and Error files need to be moved to the "Error directory".

    4. Error file log is emailed to the administrator and concerned users are alerted regarding the availability of the processed data.

    With the batch scriptiing logic you mentioned i guess the full cycle is now accomplishable.

    Thanks
    Biju.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.