Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Find characters in a list

  1. #1

    Default Find characters in a list

    Hello.

    I have a issue that I can't solve.

    I have the field like a image, that I return the values W, T, @, CH or SAC.

    So, I need to check if this values are in the 4 or 5 firts characters, but I don't have a pattern in my field.

    https://i.ibb.co/SxstqgG/lista-obs.png

    Can anyone help me?


  2. #2
    Join Date
    Apr 2008
    Posts
    4,674

    Default

    How many lines are you hoping to push through it in a run?
    How often will you be running the data?

    How often will the characters that you're testing change?
    How often will the number of characters you're looking in change?

    You could:
    Strings Cut into new field (TestChars) Chars 0- 4
    Filter Rows where TestChars contains W or TestChars contains T or TestChars contains @ or TestChars contains CH or TestChars contains SAC

    If you need to populate a Flag field (Chars Found) you can route both the True and the False to "Add Constants" steps (two of them!) to add your "True" and "False" flags and then merge the streams back together.


    This should be a pretty performant path, but better options might exist (Formula step, UDJC, etc)

  3. #3

    Default

    In the first run, I hope import a sheet with 200.000 lines from the past. After that I will import 20.000 daily.

    Each price quote has a entry (OBSERVACAO) with a long text filled by users. This field that can be information about source (W - Whatsapp, T - Telephone, @ - e-mail, CH - chat and SAC) and return this, or can don't have this information and have another information that I need return null.

    I will try this method, but if I flag the true rows, I still need return the source information.

    Thanks

  4. #4
    Join Date
    Aug 2016
    Posts
    281

    Default

    Quote Originally Posted by gutlez View Post
    This should be a pretty performant path, but better options might exist (Formula step, UDJC, etc)
    A little side-note: I found the Formula step to give poor performance on big data. Not sure if this was because I'm using an older version or perhaps the formula I used was long/complicated.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,674

    Default

    Quote Originally Posted by thiagofred View Post
    I still need return the source information.
    So... What if some creative type enters W@TCH as the first 5 characters?

  6. #6
    Join Date
    Apr 2008
    Posts
    4,674

    Default

    Quote Originally Posted by Sparkles View Post
    A little side-note: I found the Formula step to give poor performance on big data. Not sure if this was because I'm using an older version or perhaps the formula I used was long/complicated.
    Which sense of big data do you mean?
    Formula can be pretty slow if you're doing string searches in a very long field, but as far as I've experienced (note: I only really go up to the 100K rows level), it's fairly efficient. UDJC is more efficient, but takes longer to develop and takes more work to maintain. The balancing act is cost (in minutes) per run vs cost (in minutes) of development/maintenance.

  7. #7
    Join Date
    Aug 2016
    Posts
    281

    Default

    I don't remember exactly cause I had to discard it, but it was some simple mathematical expression with rounding. A Calculator step was not useful here because it was the sort of calculation which consisted of 4-5 smaller calculations. I'm dealing with much more than 100k rows though, and once the Formula was replaced, there was a significant boost in speed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.