Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: Using "Replace in String"

  1. #1

    Default Using "Replace in String"

    Hello Everybody,

    I am new to Pentaho Kettle CE 4.4 (Stable)
    My requirement is to read the Cust_Name from an excel input and replace the below characters/string in the values.

    1. Any kind of extra space/s
    2. " AND " starting and ending with space/s
    3. " & " starting and ending with space/s
    4. "." Dots just mere dots because any space before and after would be taken care by point number 1
    5. "-" Dash
    6. M/s

    Now I did use Replace in String transformation, however the output isnt as expected.
    Spoon converts and deletes even the alphabets before and after dots, dashes,etc. which have been mentioned above.
    In few cases it doesnt remove the dots (.)

    Do we have a precedence or an order for this?

    Would appreciated any kind of answer!!!
    Thank you

    Cheers
    Ronak Shah

  2. #2
    Join Date
    Feb 2011
    Posts
    840

    Default

    Hey Ronak! officially, there's no "order" on RiS, and it's "wrong" to alter the same field more than once - I have no idea as to the why, I've just been pointed this before and am passing on =p

    But in my experiments I've found out that you can, indeed, but it seems all the alterations try to take place at the same time - so if there are overlapping rules, you're better off with 2 or more RiS steps in sequence. It'd be better if you gave us some example data and how you're trying to configure the step. One example I have at hand now is this:
    In stream field Out stream field use RegEx Search Replace with Set empty string? Replace with field Whole Word Case sensitive
    observations N DLO_TIPO ${DLO_TIPO} N N N
    observations N ANO ${ANO} N N N
    observations N MES ${MES} N N N

    If you try this first, then this, than this other one, it'll mess up. One replacing rule can't depend on another, be it upper or lower. If you have replacing dependencies, do multiple RiS steps.
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  3. #3

    Default

    Hey joao.ciocca,

    Thanks a lot for your response...Did understand your explanantion..will try that for sure...
    I will give the sample data for your reference, also i have attached the image of the RIS Settings.
    Obviously the name of the industry is different for each row and this is just for sampling purposes.

    Before Transformation After Transformation Expected Output?
    R.I.S.K. Industries . Industries No (RISKIndustries)
    R. ISK. Industries . ISKIndustries No (RISKIndustries)
    R.I..K Industries ..KIndustries No (RIKIndustries)
    R I S K Industries RISKIndustries Yes
    RI&SK Industries RISKIndustries Yes
    RI & SK Industries RISKIndustries Yes
    RI AND SK Industries RIANDSKIndustries Yes for spaces, No for "AND"
    RIANDSK Industries RIANDSKIndustries Yes for spaces, No for "AND"
    M/s RIANDSK Industries sRIANDSKIndustries Yes for spaces, No for "AND" and "M/"
    M/s RIANDSK Industries Ltd. sRIANDSKIndustriesLtd. Yes for spaces, No for "AND" and "M/" and Dot
    M/s RIANDSK Industries Pvt.Ltd. sRIANDSKIndustriesPvtLtd. Yes for spaces, No for "AND" and "M/" and Dot
    M/s RIANDSK Industries Pvt. Ltd. sRIANDSKIndustriesPvtLtd. Yes for spaces, No for "AND" and "M/" and Dot
    Name:  RIS.jpg
Views: 6080
Size:  27.8 KB

    In the above example I'd like to get rid of the following:

    1. Dots (".") includes with spaces and without spaces, also spaces before and after the dot.
    2. "&" sign includes with spaces and without spaces, also spaces before and after the & sign.
    3. "AND" word includes with spaces and without spaces, also spaces before and after the AND.
    4. "M/s" word includes with spaces and without spaces, also spaces before and after the M/s.

    Hope this gives you a clear idea. Thanks once again!!!
    Thank you

    Cheers
    Ronak Shah

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    If you want to apply multiple changes to a field value, don't use Replace-In-String.
    You're much better off with a User-Defined-Java-Expression.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Feb 2011
    Posts
    840

    Default

    first thing I see is that you've checked "Y" for "Use RegEx", and that's probably one thing that's messing things up.

    marabu is probably also right, but UDJE is a total mistery to me and I can't help you there =p

    but I do think I can help with regex... I guess I've trained enough on that front so far. And since you want to remove those things, not necessarily replace them with specific things, I do think a regex could handle that...

    1. Dots (".") includes with spaces and without spaces, also spaces before and after the dot.
    => that can be found with (\s?\.\s?)
    2. "&" sign includes with spaces and without spaces, also spaces before and after the & sign.
    => this one, similar, (\s?&\s?)
    3. "AND" word includes with spaces and without spaces, also spaces before and after the AND.
    => again, (\s?(AND)\s?)
    4. "M/s" word includes with spaces and without spaces, also spaces before and after the M/s.
    => and once more (\s?(M\/s)\s?)

    Using a single line, set use Regex to Y and find "\s?(\.|&|(\s?AND\s?)|(M\/s))\s?" (no quotes).
    Trying this out here, the result was this:
    Code:
    RISKIndustries
    RISKIndustries
    RIK Industries
    R I S K Industries
    RISK Industries
    RISK Industries
    RISK Industries
    RISK Industries
    RISK Industries
    RISK Industries Ltd
    RISK Industries PvtLtd
    RISK Industries PvtLtd
    Does this help?
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    UDJE isn't rocket science. Instead of a couple of consecutive Replace-In-String steps you just call replace() resp. replaceAll() repeatedly.
    For example: name.replace(" AND ", "").replace("M/s ", "").replaceAll("[\\-\\.\\s&]", "") will get rid of the things mentioned in the opening post.
    So long, and thanks for all the fish.

  7. #7

    Default

    Thanks guys...
    I guess I have found the solution to my problem...
    The RegEx was set to Y [don't know much about RegEx]
    But I did set RegEx to N and Whole Word to N.
    For now it does get me going...still I would like to know RegEx and UDJE.
    Do provide some posts about both of them.
    But thanks for all your help....
    Thank you

    Cheers
    Ronak Shah

  8. #8
    Join Date
    Feb 2011
    Posts
    840

    Default

    Quote Originally Posted by rshah20713 View Post
    For now it does get me going...still I would like to know RegEx and UDJE.
    Do provide some posts about both of them.
    But thanks for all your help....
    I was going to play the "LGTFY card", but nah =p
    here are a couple links I found:
    http://www.codeproject.com/Articles/...Regex-Tutorial
    http://regex.learncodethehardway.org/book/
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  9. #9

    Default

    Hey joao.ciocca,

    Thanks for the links...I did google about RegEx and UDJE got this link
    http://wiki.pentaho.com/display/EAI/Regex+Evaluation
    Hopefully I do get more info for the same.
    Thank you

    Cheers
    Ronak Shah

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.