Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Job compress files based on their creation date

  1. #1
    Join Date
    Jun 2007
    Posts
    138

    Default Job compress files based on their creation date

    Hello Folks,

    I have been trying to create a job – to implement usecase, but I was not able to find appropriate tools.

    I have kettle 3.0 (finally, We upgraded!). I am using spoon/ job. I have implemented similar thing using MS SQL server BI studio. Not sure if I can do this with kettle job.

    Usecase is like,

    File keeps arriving on the same location. I need to archive them all. The files will be mostly csv files & o/s is windows. Based on their creation date.

    The job should initiate when -

    Free space on that particular windows driver gets decrease then 20 Gigs
    (if this was Unix, I could have done it easily. By using Unix job or autosys stuff)

    Every time it should keep zipping file based on it’s creation date. It should zip all the files , but it shoudn’t compress file which arrived in last 30 days. There should be a filter which should send a mail Based on status of compression – if it happens.

    Start point à keep checking file space usage. When it crosses 20 Gigs, initialize ().
    (Alternate – run this job once in a week)
    Do
    Find files older then 30 days à Zip themà if success then do a mail with that status.
    à if failure then mail with failure logs.


    I had hard time in finding conditional filters & file creation time finding methods in job. Please guide me further.

    If I cant get files, based on their creation time, then I might have to use some other stuff. I don’t wanna complicate this simple task unnecessarily.

    I have attached a row skeleton of job which I have created.

    Thanks in ad !
    Attached Files Attached Files
    Regards,
    kedar.mehta@tcs.com ,
    Tata consultancies Ltd

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Get file names step also gives you the dates of a file, which you then can filter out with a filter step and pass them on to a job via "copy rows to result".

    Regards,
    Sven

  3. #3
    Join Date
    Jun 2007
    Posts
    138

    Default job

    Thanks Sven.

    Sorry for late reply. Was ill since last few days !

    I didnt understand the way you mentioned. Actually, i dont have much exp on Kettle Job, so my question will be like that -

    Get file name/filter - are not defined in Job. So for that I will be creating a Transformation.

    I will go for these steps

    a) Create/iniaite a job
    b) call a transformation
    c) get file name step
    Question1) how do i get file timestamp/date using file name? Does this step return creation date as well?
    d) filter it based on date

    e) then what?

    Question 2) how do I return back to Job & what do i do there?
    I cudnt find "copy rows to result" in transformation tool set !! ( kettle 3.1)

    Question 3) can I check space usage on Windows/Unix (sun) drive?!

    Thanks a lotttt , in advance !
    Regards,
    kedar.mehta@tcs.com ,
    Tata consultancies Ltd

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    "get file names" has a last modified date as output field

    "copy rows to result" is under the job category in the transformation view.

    After that it depends ... there's a zip job entry e.g.

    Regards,
    Sven

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.