Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Split data from irregular CSV

  1. #1

    Default Split data from irregular CSV

    Hi everybody.

    I have the CSV file below and I need to split this. I tried to use Replace in String, replacing \;{1,} by | (using Regex) after that Split Rows by |, but in the rows 4 and 8, the Date was in Number Field.



    In this case I need to organize like this, with null where I don't have number.




    Can anyone help me?

  2. #2
    Join Date
    May 2016
    Posts
    282

    Default

    If you are getting a malformed CSV with no consistent number of fields you are not going to be able to implement an automatic procedure to organize a coherent output, you'll have to go manually row by row to fix it, nobody is going to be able to help you.

    If you have always the Jun column, and always the Date column with the format dd/mm, you can first clean up the file as you have done so you have something like this:
    Code:
    Jun|Number|Date|Time|Length|Class|Type|Tp
    8931|DISA|31/08|07:02:01|00:00:16|CCR  |ATD|R
    8932|DISA|31/08|07:12:16|00:00:16|CCN  |ATD|R
    8931|31/08|15:26:21|00:01:27|DDD  |ATD|R
    And then go row by row locating the first / character (and this won't work if the Number column can have a / as part of the string, something like D/ISA) and counting the number of | separators to complete the number of fields consistently:
    Code:
    Jun|Number|Date|Time|Length|Class|Type|Tp
    8931|DISA|31/08|07:02:01|00:00:16|CCR  |ATD|R
    8932|DISA|31/08|07:12:16|00:00:16|CCN  |ATD|R
    8931||31/08|15:26:21|00:01:27|DDD  |ATD|R
    But given the inconsistent format of your initial CSV, I wouldn't count on it being the only inconsistency you're going to find.
    Regards
    OS: Ubuntu 16.04 64 bits
    Java: Openjdk 1.8.0_131
    Pentaho 6.1 CE

  3. #3

    Default

    Thanks Ana.

    So, I always have the Jun column and the Date Column, but not always I have the Date Value. This is a report from a then in this case I need to insert a null value between | like your example.

    How can I insert this | between Jun and Date Value? Which step can I use?

    Thanks again.

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    It would have been a LOT easier if you had supplied a sample copy of your input data.
    We can't really be expected to retype your source data can we?

    It looks like some of your fields are "wrappered" in ; (think like enclosure), as well as using ; as a field separator. BAD!
    Last edited by gutlez; 09-05-2018 at 01:29 PM.

  5. #5
    Join Date
    May 2016
    Posts
    282

    Default

    Quote Originally Posted by thiagofred View Post
    How can I insert this | between Jun and Date Value? Which step can I use?
    I would try first with the Formula step, maybe concatenating more than one formula step to create temporary columns to help me know if I need to add an extra | and in which position, it might be done using one Formula step or the Regex Evaluation step and then the Formula step, but I'm not good enough with regular expressions to try
    If I'm not able to do it with the Formula step, then I would try with the Modified Java Script Value, but that is a last resort as it can consume a lot resources and be a bottleneck with big files. I don't think it would be an issue for you, probably your file won't be specially large.
    Regards
    OS: Ubuntu 16.04 64 bits
    Java: Openjdk 1.8.0_131
    Pentaho 6.1 CE

  6. #6
    Join Date
    Aug 2016
    Posts
    290

    Default

    To handle complicated logic and string operations, I'd go straight to java code to do it properly.

  7. #7

    Default

    Thanks for help Ana.

    After I transform comma in |, I used a Modified Java Script step to select Number values where like a date, then I moved the values to correct field.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.