Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Column length issue: max length of string data type

  1. #1

    Default Column length issue: max length of string data type

    Hi,

    Can any body please tell me the maximum length allowed by string in select values.

    I have some columns which are having length of 4000 characters.I am using text file output(tab separated).I set all the columns length to max values.But some times after certain length, data is truncating and appending with the immediate columns.

    Does anybody have some idea about this issue.

    Regards,
    Pradeep

  2. #2
    Join Date
    Jan 2009
    Posts
    1

    Default Same problem, truncating my string too

    I am having the same problem. My file gets truncated at 4600 chars. I'll attach it here. It is a file with a single text field, and the text field is compressed with GZIP. When the field is uncompressed there should be some malformed XML. Its not my file so don't give me hell about it, I'm just the consumer here.
    When I do a file load and use the gzip format on the setting and then preview, it shows the entire contents of the file. when I click the row, in the preview, it shows the truncated version. So when I try to add root xml tags to the xml string, it cuts off the end of the xml.
    You could put a .gz as the extension and load it that way. If I gzip -d the file first and then load it, I get the same result. Kettle still truncates the data.

    Update:
    I took out the separator ';' and now I get passed the 4600 char. the ';' is the default value. My record only has one line in it, and doesn't have a separator. Taking that out I get to ~6200 char. The file is still longer than that. It does not truncate on any special character at this position.
    Attached Files Attached Files
    Last edited by brianmrowe; 05-09-2009 at 09:07 AM. Reason: Added an update

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Hi Pradeep,

    the maximum length of a String in Kettle is 2^31-1 (or Java Integer.MAX_VALUE) = p, li { white-space: pre-wrap; }2,147,483,647


    Hi Brian,

    If things are the way you explain them, then this really is not a problem that can be solved. First of all, when you have binary data in a text file, it's no longer a text file. Second: once you GZip a piece of XML (itself a binary format since it can contain any possible encoding) you can end up with any possible byte in the result. Those bytes get interpreted as text using a certain encoding... or not during read-back.
    For example, if the result of the GZip stream contains a newline (bytes 10 or 13) your line of data will be trucated.

    To counter this reason, encodings like Base64 were invented to make sure no non-ASCII characters end up after binary compression/encoding.

    In short: sorry, nothing we can do about it. It's the law of preservation of missery. The idiots that created the file in the first place started it.

    Take care,
    Matt

    P.S. Just out of curiosity: did you post this question before or after your troll blog post?

  4. #4
    pstoellberger Guest

    Default

    As I said on IRC and as a comment in the blog post:
    take the text file input, choose fixed length columns, set length to 15000 and you get the whole string without a problem. i tested that with your data file

    didn't test but you can even say its a gzip file and the step will extract it for you

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Both solutions have their issues.

    Treat as GZip file: only one stream can be reliably uncompressed. The second is ignored.

    Fixed length columns : still dangerous because of text encoding issues.

  6. #6

    Default Column length issue: max length of string data type separated by commas

    Hi Matt,

    Thanks for your quick response. My source is MYSQL DB and destination is Text File. I have a data in my column separated by (commas) ','.I am getting proper output for with out ',' separated text even though column having data more than 4000 chars. I have problem of data truncating and appending with the other columns for ',' separated only.

    Could you please provide me your valuable inputs for handling data for the above requirement.

    Regards,
    Pradeep.

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    In the text file output fields you can specify a length. For fixed width file output, it's crucial you exactly get that width so we truncate and pad to that specified length.
    The easiest way to not have Kettle do that is simply to not specify lengths in the fields section. (leave the value blank)

    HTH,
    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.