Exclude first lines of a CSV file
I m working with some "almost CSV" files as input. the only diference btw those files and the "strict CSV" is that first lines (6) contains comentar, dates and so on (no structured at all). Only after the 6th position start the real deliminated structure.
Does exist a way to ignore first lines, before treating the CSV structure, or at least a best practice or trick?
is that absolutely always constant?
I would just use tail on the file before reading it in PDI. If thats absolutely not possible then you could always csv input it as one massive string, ignore first 6 rows, write to another file and then load that file.
Good tip by codek.
Another possible solution, provided files consistency.
- text file input, with a single field; in "Content" tab check "Rownum in output" and provide a name
- filter row: rownumber > 6
- split fields step: provide a delimitator ("|" in your case) and give fields a name and the rest of metadata
Would it be useful?
join the community on ##pentaho - a freenode irc channel
I tried the Ato's solution and its working. A bit anoying to be obliged to define manualy each field in the "split step", but it s working great
Codek: I like your proposal but does exist a way to do this Tail direcly in the Kiddle or do you mean "tail" the file before transfer the file to Keddle?
Thanks to both
When this has come up before, I have suggested making a template file:
Take a sample file and strip out the lines of offending data, leaving the good headers in place.
Build your Text File Input step, using these good headers and the sample data.
Adjust the skip lines and point the Text File Input to your good file.
It works quite cleanly.
Tags for this Thread