PDA

View Full Version : Job compress files based on their creation date



kedar mehta
04-04-2009, 08:01 PM
Hello Folks,

I have been trying to create a job – to implement usecase, but I was not able to find appropriate tools.

I have kettle 3.0 (finally, We upgraded!). I am using spoon/ job. I have implemented similar thing using MS SQL server BI studio. Not sure if I can do this with kettle job.

Usecase is like,

File keeps arriving on the same location. I need to archive them all. The files will be mostly csv files & o/s is windows. Based on their creation date.

The job should initiate when -

Free space on that particular windows driver gets decrease then 20 Gigs
(if this was Unix, I could have done it easily. By using Unix job or autosys stuff)

Every time it should keep zipping file based on it’s creation date. It should zip all the files , but it shoudn’t compress file which arrived in last 30 days. There should be a filter which should send a mail Based on status of compression – if it happens.

Start point à keep checking file space usage. When it crosses 20 Gigs, initialize ().
(Alternate – run this job once in a week)
Do
Find files older then 30 days à Zip themà if success then do a mail with that status.
à if failure then mail with failure logs.


I had hard time in finding conditional filters & file creation time finding methods in job. Please guide me further.

If I cant get files, based on their creation time, then I might have to use some other stuff. I don’t wanna complicate this simple task unnecessarily.

I have attached a row skeleton of job which I have created.

Thanks in ad !

sboden
04-05-2009, 04:46 AM
Get file names step also gives you the dates of a file, which you then can filter out with a filter step and pass them on to a job via "copy rows to result".

Regards,
Sven

kedar mehta
04-13-2009, 06:10 PM
Thanks Sven.

Sorry for late reply. Was ill since last few days ! ;)

I didnt understand the way you mentioned. Actually, i dont have much exp on Kettle Job, so my question will be like that -

Get file name/filter - are not defined in Job. So for that I will be creating a Transformation.

I will go for these steps

a) Create/iniaite a job
b) call a transformation
c) get file name step
Question1) how do i get file timestamp/date using file name? Does this step return creation date as well?
d) filter it based on date

e) then what?

Question 2) how do I return back to Job & what do i do there?
I cudnt find "copy rows to result" in transformation tool set !! ( kettle 3.1) :(

Question 3) can I check space usage on Windows/Unix (sun) drive?!

Thanks a lotttt , in advance !

sboden
04-14-2009, 02:10 AM
"get file names" has a last modified date as output field

"copy rows to result" is under the job category in the transformation view.

After that it depends :) ... there's a zip job entry e.g.

Regards,
Sven