PDA

View Full Version : Using a Custom Format for fields in Json Input step



dcarter
06-20-2012, 01:36 PM
I need to extract data from Mongodb just like the tutorial at http://wiki.pentaho.com/display/BAD/Create+a+Report+with+MongoDB (https://owa.postoffice.net/owa/redir.aspx?C=40fdf5a118ff4e058236693434f51387&URL=http%3a%2f%2fwiki.pentaho.com%2fdisplay%2fBAD%2fCreate%2ba%2bReport%2bwith%2bMongoDB)



Here is an example of one of the field entries:



{"$date":"2012-03-01T00:00:00.000Z"}


This is ISO 8601 format. My problem is that my date data doesn't match any of the pre-set "format"s in the pulldown menu within the 'fields' tab of the "Json Input" step.



One of the pre-set formats, for example, is

yyyy/MM/dd HH:mm:ss.SSS
However as you can see, my data uses dashes instead of forwardslash, has a 'T' instead of a space between date and time, and has a trailing 'Z' at the end of each timestamp. I have tried entering in the format manually, i.e.


yyyy-MM-ddTHH:mm:ss.SSSZ


But the non-standard 'T' and 'Z' characters (along with possible other things) are causing it to fail. Is there some syntax or method I can use to modify the format field to efficiently pull out this date data?



Thank you so much,


Dan

marabu
06-20-2012, 04:14 PM
Déjà-vu?

http://forums.pentaho.com/showthread.php?97484-Date-parsing-error

cedricdevroey
06-20-2012, 04:49 PM
Hi Dan,

It is a work around, but what would work is the following:
Read the date as a string and give it to a "Regex Evaluation" step with a regex like this: (\d{4}\-\d{2}-\d{2})T(\d{2}\:\d{2}\:\d{2}\.\d{3})Z
See the (...) (...) there? Those are the capture groups: the first one is for your date, the second one for your time.
Then use calculator to add the time to the date in a new timestamp field.

Link to example transformation file (.ktr): https://docs.google.com/open?id=0BwUh7Ex41wyBTnBEM3Z6UzZaeG8

Screenshot:
8923

Grtz,

Cedric
Self proclaimed Data Ninja

cedricdevroey
06-20-2012, 06:54 PM
Uhm, Dan, a date with format-string yyyy-MM-dd'T'HH:mm:ss.SSS'Z' also works, so forget my complicated workaround earlier ;-)