Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Regex Evaluation Step

  1. #1
    Join Date
    Apr 2008
    Posts
    1,771

    Question Regex Evaluation Step

    Hi.
    I have tested my regular expression on http://myregexp.com/signedJar.html

    but when I use it in Regex Evaluation Step it returns different matches.
    For example, on that website "PE1 2UX" is a match, using the RES is not a match.

    Has anyone already done a comparison between them?

    One more question: why if I do NOT check the box "Create fields for capture group" the step fails stating that the number of fields created (0) does not correspond to the number of groups (4)?

    Thanks.
    Mick

  2. #2
    Join Date
    Dec 2009
    Posts
    609

    Default

    Hi,

    what is your REGEX looking like? And against which values are you testing?

    Cheers,

    Tom

  3. #3
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Tom.
    At the end I decided to change my regexp!

    Out of frustration I deleted the old regexp

    But I am still puzzled by the request to create fields for Capture.
    I'll double check this and file a Jira.

    Mick.

  4. #4
    Join Date
    Dec 2009
    Posts
    609

    Default

    Hi Mick,

    I suppose you need to skip "(" and ")" in your regex when having the Checkbox "create capture groups" unchecked,

    Usually, I use Regex to extract certain parts of a given String in order to fill the matching parts into new columns. This is what capture groups are used for.
    E.G.:
    Having this kind of string:
    12345678.SOMETHING.ANOTHER_SOMETHING.0001

    and using the Regex:
    (.*)\.(.*)\.(.*)\.(.*)

    And checking the checkbox "create capture groups" will need you to configure 4 new "columns" in the lower section of the Regex step:
    Each new column/line corresponds to the number of "(.*)" parts of the regex...

    So again: Unless you do not have "()" groups in your regex I suppose you will need to have to configure capture groups.

    HTH,

    Tom

  5. #5
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Tom,
    thanks for your suggestion.
    I think that if there are () in regex, even if you do not want to capture groups, PDI force you to do so.
    I am a complete regex newbie and it could be that my expression would work even without (), but I still think that unless you ask PDI to capture those groups that data should be discarded.
    Note that I use regex to validate my data, not to split the string, therefore I do not care about capture fields.

    At the moment I have created capture fields and then discard them using "Select values" step.

    Mick

  6. #6
    Join Date
    Dec 2009
    Posts
    609

    Default

    Hi Mick,

    well, if you could post an example value and the regex you would like to test, I could assist you in verifiying if PDI's got a problem there or if it´s just a configuration issue of the Regex Step

    Cheers,

    Tom

  7. #7
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Tom.
    My regexp: [A-Z]([A-Z0-9]|\s){2,3}\s[0-9]([A-Z]{2})
    and I try to validate UK Postcodes:
    AB10 1AX
    AL5 3DD
    B1 3EG
    B10 9DH

    As I have written before, I do not need to capture any group, only to validate the postcode format.

    Have fun!
    Mick

  8. #8
    Join Date
    Dec 2009
    Posts
    609

    Default

    Hi Mick,

    hmm... this seems to work for me... see the attached Transformation (Used PDI 4.1.0 GA)
    regex1.ktr

    Cheers,

    Tom

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.