Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Split inconsistent address field

  1. #1
    Join Date
    Oct 2017
    Posts
    3

    Default Split inconsistent address field

    Hi, I have just started working with PDI two weeks ago, and there is something I can't figure out (even after reading other posts about this topic).
    I have a column with an address, which can have either of the formats below:
    Aagje Dekenlaan 1
    Jacoba van Beierenlaan 1
    Buitenwatersloot 1 B
    Jacoba van Beierenlaan 1 B 1
    Jacoba van Beirenlaan 1 B-1
    Jacoba van Beirenlaan 1 B1

    Where each "1" could have a value of 1-1000
    I would like to split this column into three columns, but I can't seem to make it work correctly. I have tried the regex function with different suggestions I found on Google, but it seems that the inconsistent way in which the data in this column is organised messes every function I have tried up.
    The way I would like to organise it:
    Aagje Dekenlaan 1
    Jacoba van Beierenlaan 1
    Buitenwatersloot 1 B
    Jacoba van Beirenlaan 1 B 1
    Jacoba van Beierenlaan 1 B-1
    Jacoba van Beierenlaan 1 B1
    So only words in the first column, only numbers in the second, and what is left in the third.

    Is this even possible in PDI? And if so, which expression should I use for it?

  2. #2
    Join Date
    Sep 2011
    Posts
    152

    Default

    Please find attached transformation, i have used regex to split your source field to three target fields, hope this helps.

    Kindly Let me know if it works.
    Attached Files Attached Files

  3. #3
    Join Date
    Oct 2017
    Posts
    3

    Default

    It worked perfectly, thank you very much!

  4. #4
    Join Date
    Oct 2017
    Posts
    3

    Default

    I just found out that some rows in my data are actually as follows:
    Jacoba van Beierenlaan 1-15 B1
    I have been trying to add this condition to the Regex expression, so that the second column would become 1-15, but I can't make it work. Could you please help me again?

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Add the poor man's dash (minus) to the set of allowed characters for the second field: [0-9\-]
    So long, and thanks for all the fish.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.