Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Regex Evaluation Issues

  1. #1
    Join Date
    May 2017
    Posts
    7

    Default Regex Evaluation Issues

    Hi Group

    I hope someone can help.

    I have a regex evaluation step and the expression to find the email with a string

    ([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)

    The string is as follows

    Username: uceses9Name: Test SharifTelephone (including code): 01344322585Email: Testf958@gmail.com

    The expression works with online regex testers yet in the PDI whether I use the regex evaluation step or replace in string step brings back null values

    Can someone help me in the right direction

    Thanks in Advance
    Chirag

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    It's becuase PDI adds an implicit ^ and $ to either end of the RegEx
    so your RegEx that is actually being run is ^([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)$

    Try that in your Online verifier.

    Note that your RegEx is not actually RFC5322 compliant (
    https://en.wikipedia.org/wiki/Email_address). Testf958+Pentaho@gmail.com will reach your mailbox, but will not pass your RegEx.
    Also note that "Testf958@Pentaho"@gmail.com could actually be a valid email address.

    In short... Don't try to write your own validation of email addresses. Split your incoming string (on space perhaps?) and then put the pieces into the correct place afterwards.

  3. #3
    Join Date
    May 2017
    Posts
    7

    Default

    Hi
    Thank You for your swift reply. I've used the verifer with the implicit ^ and $ and it comes up with no matches. The problem I have the text I have is unformatted with no delimiters so Regex is my only
    solution. I just keep getting null as result what should the syntax be for ([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+) to work

    Thanks
    Chirag

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    If you absolutely have to use this RegEx (Bad Idea!)
    Try:
    .*?([a-zA-Z0-9._-\+]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+).*?

    Since you know your line contains "EMail:" you can redirect your row using a Filter Rows entry, and then process just that row using your formatted rules.
    RegEx probably shouldn't be your go-to, especially if you are asking how to build specific RegEx values here.

    Take the time to learn *ALL* the steps, it will increase what others can do to maintain your transformations for you.
    Last edited by gutlez; 05-29-2019 at 03:51 PM.

  5. #5
    Join Date
    May 2017
    Posts
    7

    Default

    Thank you that worked

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.