Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Delimiter Count !!

  1. #1
    Join Date
    Mar 2013
    Posts
    66

    Default Delimiter Count !!

    Hi,

    Thanks in advance for your support.

    I have input file with delimiter as "|". How can I get the count of delimiter for each line ?

    Till now, I have used UDJC class that counts the delimiter in the line. Please find the attachments.

    Any other suggestions/improvements, much appreciated
    Attached Files Attached Files

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Question: why do you want to count how many delimiters you have for each row?
    They should be the same!
    Or do you want to check if a row has an EXTRA delimiter which causes the data to move into the wrong column?
    -- Mick --

  3. #3
    Join Date
    Nov 2008
    Posts
    777

    Default

    Quote Originally Posted by harikumar_23 View Post
    Till now, I have used UDJC class that counts the delimiter in the line. Please find the attachments.

    Any other suggestions/improvements, much appreciated
    I think a UDJE (User Defined Java Expression) step would be much more suitable for this because it's much simpler than maintaining the UDJC boilerplate code. Since the PDI distribution already includes the Apache Commons library, all you have to do is set the "Java expression" to this:

    Code:
    org.apache.commons.lang.StringUtils.countMatches(input, "|")
    Using this method, I get these results when I use your Unix-delimited text file as input:

    Name:  delimiter_count.jpg
Views: 86
Size:  13.6 KB


    input del_count
    1111111|1212121221|aaa|rrr||rrrr|123456789||123456789|123456789||0|123456789|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 124
    2222222|2121212121|sss|ttt||rrrr|123456789||123456789|123456789||0|123456789|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 144
    3333333|3231211122|ccc|hhh||rrrr|123456789||123456789|123456789||0|123456789|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 132
    Last edited by darrell.nelson; 12-12-2013 at 10:38 AM.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  4. #4
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Darren,
    that's brilliant!
    -- Mick --

  5. #5
    Join Date
    Mar 2013
    Posts
    66

    Default

    The requirement says, if the delimiter exceeds the threshold value, the record stream has to be eliminated for processing.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.