Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Input CSV 'Get Fields' giving me incorrect data types?

  1. #1
    Join Date
    Nov 2016
    Posts
    2

    Default Input CSV 'Get Fields' giving me incorrect data types?

    Am I using this step incorrectly? When I input data from a CSV I'm clicking 'Get Fields' to have Kettle determine the best compatible data type (sampling the entire data set) and I keep running into issues where the recommended data type can't be imported. I use the data types it suggests then run the import and it fails because it can't convert a value to an integer...shouldn't this be detected when it classifies the data type? Is there another step I should use to scan the data set that offers a more precise solution (if not all values can be Integers, choose string for instance)? I've tried lazy conversion enabled and disabled w/ the same issue.

    Thanks!

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Wait, you are using a Spoon convenience feature at design time and hope for the best at run time?
    Kettle tries very hard to detect metadata information from data samples, but there's no guaranty for zero conversion errors.
    So, you can hope, but you can't demand.
    If there's a chance for errors you should enable error handling.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Oct 2014
    Posts
    15

    Default

    You can always try the Data Validator step to check the hole file to verify your datatypes.

  4. #4
    Join Date
    Apr 2016
    Posts
    156

    Default

    Quote Originally Posted by jdavid459 View Post
    Am I using this step incorrectly?
    No, not using it incorrectly -- as @marabu mentions above, the 'Get Fields' data type sensing (even if you select all rows) is a convenience feature; not meant to be foolproof.

    The feature (Get Fields --> autopopulate data types) is helpful for reading headers and basic data types. For strongly-typed input data, it's great. In cases like you say there are outliers, then you as ETL designer should be sure to accommodate them.

    In PDI speak, this might mean identifying the data elements that you can't assume are strongly typed, importing them as something simple (e.g. String), and then use a combination of steps downstream of the CSV Input that will check the data type of certain fields (and, if a certain field doesn't match the data type you want, then somehow setting it to the correct value).
    My runtime environment: MacOS, JDK 1.8u121, PDI 7.0

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.