Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Data Validator question

  1. #1
    Join Date
    Dec 2015
    Posts
    4

    Default Data Validator question

    Hi All,

    I have a question about data validate function and would like to ask for help.

    The original Data set I have:
    Customer Total
    Andy 550
    Brian 300
    Cindy BBC


    I am using "data validator" function and set "Total" Column accepted only "numeric" data. Then,I would like to use "set field value to constant" and replace all non-numeric row to "blank".

    However,I cannot got the result as I expected. Can anyone know how to deal with it?


    Result I received:
    Customer Total
    Andy
    Brian
    Cindy


    The expected result I would like:
    Customer Total
    Andy 550
    Brian 300
    Cindy

  2. #2

    Default

    you could use a regex that would find the alphabets and then replace those in blank as u expect it to.

    Thanks,
    Santosh

  3. #3
    Join Date
    Dec 2015
    Posts
    4

    Default

    Quote Originally Posted by santoshbhagavatula View Post
    you could use a regex that would find the alphabets and then replace those in blank as u expect it to.

    Thanks,
    Santosh
    In fact,

    I would like to replace all numberic or characters of the column and keep all the date only. How can I do??

    original Data set I have:
    Customer Date
    Andy
    05/12/2015
    Brian
    05/12/2015
    Cindy NA


    Expected Result
    Customer Date
    Andy
    05/12/2015
    Brian
    05/12/2015
    Cindy

  4. #4
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    I would set the Date as a STRING and use a Replace in String step.
    Then after conversion you can use a Select Value step to convert that field into a Date field.
    -- Mick --

  5. #5
    Join Date
    Aug 2011
    Posts
    360

    Default

    I'll read the field as string, then do a regex with a pattern matching only valid dates format.
    Then with a filter step redirect every rows that don't match the pattern to a stream where you set your blank value.
    Finally assemble the two streams (just make point both hops to the same step).

    Alternatively, i m not sure if it works: use a metadata step to change your string field to a date field with appropriate
    format, and put an error handling stream on this step. Every rows in error should be the one with bad dates.
    You can then set the field to blank value. Maybe this is the fastest

  6. #6
    Join Date
    Dec 2015
    Posts
    4

    Default

    Quote Originally Posted by Mathias.CH View Post
    I'll read the field as string, then do a regex with a pattern matching only valid dates format.
    Then with a filter step redirect every rows that don't match the pattern to a stream where you set your blank value.
    Finally assemble the two streams (just make point both hops to the same step).

    Alternatively, i m not sure if it works: use a metadata step to change your string field to a date field with appropriate
    format, and put an error handling stream on this step. Every rows in error should be the one with bad dates.
    You can then set the field to blank value. Maybe this is the fastest
    Hi Mathias,

    I am newbie to Pentaho, can you tell me how can I do a regex with a pattern matching only?

    Currently, I use the replace by String and set the criteria to replace all"[a-zA-Z]" to "blank". And it is fine to replace all character value but I would like to know how can I replace other data format?(Especially for numeric items.)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.