Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 18

Thread: Handling tab seperated values and comma separated values in text file input

  1. #1

    Default Handling tab seperated values and comma separated values in text file input

    Hi,

    I am using text file input step to read .txt file. This file can be either tab separated .txt file or comma separated .txt file.
    Both files have same fields. How to handle this in etl?
    we do not know whether user will pass tab separated or comma separated file to the etl transformation.


    Thanks
    Ajinkya

  2. #2

    Default knowing the metadata

    You should always be knowing about the metadata. if not try using metadata injection step


    Quote Originally Posted by Ajinkya View Post
    Hi,

    I am using text file input step to read .txt file. This file can be either tab separated .txt file or comma separated .txt file.
    Both files have same fields. How to handle this in etl?
    we do not know whether user will pass tab separated or comma separated file to the etl transformation.


    Thanks
    Ajinkya
    Regards,
    Dileep
    Mail ID

  3. #3

    Default

    Hi,
    I did not understand how metadata injection step will help?
    Can you please elaborate?

    Thanks,
    Ajinkya

  4. #4
    Join Date
    May 2016
    Posts
    280

    Default

    You create some logic to determine which separator the file is using, and then you inject the separator as metadata.
    Regards
    OS: Ubuntu 16.04 64 bits
    Java: Openjdk 1.8.0_131
    Pentaho 6.1 CE

  5. #5

    Default

    Hi
    I am stuck in logic to determine separator.
    I used text file input step, fixed format (so that I can read whole line as single field), limit 1.
    So that I can read only 1st row of the file.
    Then I need to check whether separator in this first row is tab or comma "," or "|" . Once confirmed I'll set separator in a variable which I'll use in my original text file input step in another transformation.
    Question is How could I check/determine separator in the first row? I tried filter rows step but did not actually worked.

    You could suggest me if you have any other logic/way for doing this.
    Thanks
    Ajinkya
    Last edited by Ajinkya; 06-26-2018 at 06:45 AM.

  6. #6
    Join Date
    Aug 2016
    Posts
    289

    Default

    You would have to count?

    Examples:
    1) "abc,def,ghi" --> count 2 commas
    2) "abc|def|ghi" --> count 2 tabs

    This logic would not work if the values between the separators also can include the separators.
    Alternatively, just format the file first, replacing all tabs with commas and then you can handle everything the same way.

  7. #7

    Default

    Hi,

    No I do not need count.

    Please suggest me other ways. I do not want to manually replace separator in the file and then use as input to the transformation.

    Thanks
    Ajinkya

  8. #8
    Join Date
    Aug 2016
    Posts
    289

    Default

    I'm not saying you do it manually. I'm saying you could do it automatically.

    Why do you not need to count?

    If you don't want to count, I guess your only left with the option to see whether the two different chars exists in the line.

    Examples:
    1) "abc,def,ghi" --> comma exists
    2) "abc|def|ghi" --> tab exists

  9. #9

    Default

    Hi

    Can you please explain me step wise? I am badly stuck with the logic.


    Thanks
    Ajinkya

  10. #10
    Join Date
    Aug 2016
    Posts
    289

    Default

    If you replace all separators, you don't have to do any logic to determine what separators is in the file.

    If you can't replace all separators, you need to determine what the separator is before extracting data, yes?

    Step 1: Determine what separator is in use
    Step 2: Read the file using the separator from Step 1.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.