Hey Kettle friends,
Recently I got the question whether or not PDI would be able to automatically figure out, at runtime, what the layout of a delimited text file is simply by looking at the header of the file and by doing various scans of the file to determine the delimiter, data types of the columns and so on.
Now obviously you could also do without a header file but then the columns would have to be called something like “field1 to fieldN”.
First, we want to see which delimiter, from a list of possible delimiters, is used in the file. We want to do this by reading every line of the file in question and by doing a character-by-character analyses of the line. We’ll count the number of occurrences of all possible and then pick the most often used delimiter at the end and set it as a variable.
Detecting the delimiter


The hardest part of this is doing the counting in JavaScript. We’re using an associative array:

// A simple algorithm: count every ,;/ and tab, see which one is used most often
//
var delimiters = [ ',', ';', '/', '\\t' ];
var counters = new Array();
var i;
for (i=0;i