Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: CSV file input has '=' characters

  1. #1
    Join Date
    Jun 2015
    Posts
    5

    Question CSV file input has '=' characters

    A CSV input file from a third party sometimes has '=' characters before some fields, some days it does not.

    E.g.
    Code:
    ="01234",BOSTIK,BOS,="80214",="80214",HARD PLASTICS GLUE 20ML TUBE,="0004",="9.99",="9.99",="0.00%",="5000399006190 ",,,
    I'm reluctant to add a Replace In String Step and I'm looking for a general technique.
    How would you cleanse such an input file?

    I'm trying to learn...

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Why are you reluctant to use a Replace in String?
    If those characters are there, they will be removed; if they are not, nothing will happen.
    I would probably parse that file and strip all '=' characters using some other software.
    Do you use Win or Linux?

    An alternative would be to load the file as 1 field, do a Replace in string, then either use a split field step or create a temp text file and read that file with your Text File Input.
    -- Mick --

  3. #3
    Join Date
    Jun 2015
    Posts
    5

    Default

    Quote Originally Posted by Mick_data View Post
    Why are you reluctant to use a Replace in String?
    If those characters are there, they will be removed; if they are not, nothing will happen.
    I would probably parse that file and strip all '=' characters using some other software.
    Do you use Win or Linux?
    I am reluctant because I am picking up around 60 files of different shapes and will get more later. I like to have a general "whole row replace in string" Step available.

    I have written a Replace In String Step that applies the same Regex search/replace to each field in the problem file but maintaining these for adding new file formats will be a pain. I can use sed or awk quite happily (on Windows but I have them available). Is it easy to add a step to call an external sed script?

    I quite like your other solution and will try it:
    Quote Originally Posted by Mick_data View Post
    An alternative would be to load the file as 1 field, do a Replace in string, then either use a split field step or create a temp text file and read that file with your Text File Input.

  4. #4
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Is it easy to add a step to call an external sed script?
    Try and have a look at the Execute a process step:
    http://wiki.pentaho.com/display/EAI/Execute+a+process
    -- Mick --

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.