Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: pdi regular expression if else scenario (or capture groups)

  1. #1
    Join Date
    Feb 2014
    Posts
    29

    Unhappy pdi regular expression (if else scenario -capture groups)

    Hi folks,

    I am trying to get the datetime part from a list of files

    names can be like given below :

    a_b_c_02022018122555.txt
    a_b_c_02_2345_02022018122555_123.txt
    abc02022019122555.txt
    abc_pqr_02022019.txt
    abc_pqr_02022019_123.txt

    I need a common regex to retrieve output as follows . Basically this is equivalent to getting if(14digit_timestamp) then (get 14digit_timestamp) else (8digit_timestamp). I was trying something like (.*)([0-9]{14})[0-9]{14}|[0-9]{8}.* and cant make it work..
    02022018122555
    02022018122555
    02022019122555
    02022019
    02022019
    Last edited by pdiman; 05-03-2018 at 06:06 PM.
    Best,
    Abin

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    How about:
    .*?([0-9]{8})([0-9]{6})?.*

    Where Capture1 is Date and Capture2 is time?
    The combine them after...


    Combining the "Optional" time with date was a bit more work than I expected...
    Incoming Stream -> RegEx Eval (see above) Leave each as String!) -> If Null (convert null time to 000000 string) -> Select Values (convert Date & Time strings to Date. NOTE: USE UTC as Timezone!) -> Select Values (Convert Date & Time to Number - which will now be in milliseconds since epoch) -> Calculator (A+B) -> Select Values (Change A+B to Date in UTC time) ... The projected time will now match the time on your filenames.

    If you don't use UTC, you will end up with twice your offset (once from Date, once from Time). If you want it in your local time (which is probably better), you can leave the Timezones blank on everything *EXCEPT* the Time (string) -> Time (Date) conversion. Make that UTC and it won't add another offset.
    Last edited by gutlez; 05-03-2018 at 08:11 PM.

  3. #3

    Default

    Hi gutlez,

    I guess, in calculator step you are adding(string to date column) and (date to number column)---- why are we doing this?

    Also what is epoch?
    Regards,
    Dileep
    Mail ID

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    epoch is "The beginning of time in Unix" or 1970-01-01 00:00:00 UTC
    So if you have a Date (eg. 2018-02-02 ) that is expressed in ms since that date, and you have a time that is expressed as ms since that date (eg. 12:55:55) then you can simply add the two sets of seconds (2018-02-02 + 12:55:55) to yield a date and time.

    Formula step *should* be able to do this too, and should be easy, but it doesn't work. I should really file a Jira on that.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.