Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Datatype Validation

  1. #1

    Default Datatype Validation

    I'm wondering how to validate the datatype of data I read in from a CSV file.

    I'm reading a text file containing dates in the format yyyyMMdd, and I've specified that in the CSV Input step. But, if a record contains some bad data, then the whole transform aborts. I see that its not possible to Define Error Handling for a CSV Input step. So, instead I thought I'd read the data from the CSV file as a String, and then check the data format in a later Data Validator step. The Data Validator has an option for "Verify Data Type" and allows me to specify a Conversion Mask, but this just seems to check the existing data type (i.e. String), rather than checking to see if the String can be successfully converted into a Date.

    Is there a simple way to check whether a field in an input file is a Date in a specified format? Or do I have to write a Modified Java Script step to do this?

    I'm using Kettle 3.1.0.

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Two options that I see:

    1) Use the Text File Input step, and the Error Handling tab
    2) Write a JavaScript step

    Personally, I would go with the latter, but you have to start to question your source if you are getting data coming in with bad dates.

    What else in the data is invalid?

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    3) write a feature request for it in JIRA. Adding error handling is not all that hard to do on the data conversion code of CSV/Fixed Input.

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Of course!

    But I don't really think that's an option, it's more of a "Oh, and while you are doing 1 or 2, do 3 as well"

  5. #5

    Default

    Thanks for the suggestions. I'll try out 1) and 2).

    The data file is quite large, so its a pain if the whole thing aborts just because of a few bad records (especially when it is really difficult to locate the bad records - the spoon error message doesnt tell me which record it was processing). Yes, it would be good if I could control the quality of data I'm receiving!

  6. #6
    Join Date
    Mar 2009
    Posts
    9

    Default Date Validation

    This is a must as I have been doing this for 20 years, and never had any control of what is being given to me. Basic sanity checks are a necessity. If it could generate a flag that would allow me to filter them off that would be sufficient.

  7. #7
    Join Date
    Jul 2016
    Posts
    17

    Default Text file input works well, thanks !

    Text file input works well, thanks ! But I dont understand why the same doesnt work for CSV.

    Quote Originally Posted by gutlez View Post
    Two options that I see:

    1) Use the Text File Input step, and the Error Handling tab
    2) Write a JavaScript step

    Personally, I would go with the latter, but you have to start to question your source if you are getting data coming in with bad dates.

    What else in the data is invalid?

  8. #8
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Because no one wanted the error handling feature enough to file the Jira that Matt suggested.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.