Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Kettle Validation Questions

  1. #1
    Join Date
    Feb 2014
    Posts
    3

    Default Kettle Validation Questions

    Hi. I'm a newbie with Pentaho but an experienced ETL/BI professional. I'm attempting to do a simple validation.
    My transformations are:
    1. A simple two steps transformation in order to load de CSV input file into a MySQL -all varchars- RAW table (it works)
    2. Then a second transformation to validate input Data Type. In this case I'm casting the input. It looks like this:


    Name:  trans2.jpg
Views: 75
Size:  11.9 KB

    The input is casted from the RAW (all varchars) table into the corresponding data types, like this:

    SELECT ID_RAW, CAST( CONCAT( '20', SUBSTRING( trim( FECHAVTA ) , 7, 2 ) , '-',
    SUBSTRING( trim( FECHAVTA ) , 4, 2 ) , '-',
    SUBSTRING( trim( FECHAVTA ) , 1, 2 ) ) AS DATE ) AS FECHAVTA,
    cast(trim( ID_CLIENTE ) as signed) AS ID_CLIENTE,
    cast(trim( ID_PAISVTA ) as signed) AS ID_PAISVTA,
    cast(trim( NROSUCVTA ) as signed) AS NROSUCVTA,
    cast(trim( ID_ARTICULO) as signed) AS ID_ARTICULO,
    cast(trim( CANTIDAD ) as decimal(5)) AS CANTIDAD,
    cast(trim( PRECIOESPECIAL ) as decimal(7,2)) AS PRECIOESPECIAL
    FROM rawprueba1


    Okay then, the validation step doesn´t work well because I checked the box "Validate Data Type" for every validation of every field. Using a correct file it takes the whole input file as in error inspite of I casted the input for all of them. At the same time if I uncheck all input is accepted disregarding that I entered some numeric data as alpha and the result set of input contains null values for those fields (however I marked he "Only numeric data is expected"

    In short. what am I doing wrong? Kettel seems to be intuitive but in the end is full of tricks.

    Can anybody help me please?

    Thanks in advance,

    Marcelo

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi.
    It's better if you zip your transformation and you upload it here as attachment, so that someone can check your settings.
    But have you checked at the sample in your installation directory?
    Maybe that will give you a clue.
    -- Mick --

  3. #3
    Join Date
    Feb 2014
    Posts
    3

    Default

    Thanks Mick_data, you are write. Sory the names are in Spanish my language but I think everything is understandable..

    Here you have all the files the first ktr is to low raw data into a table RAW for further vlidation the second ktr is the one that fails as explained before.
    The file pentahotest.sql.txt must be renamed without txt leaving only the SQL as extesion and it is a MySQL dump of all my test data. Ypou can easiky import all into a MySQL DB

    Thanks a lot for your help.

    Marcelo
    Attached Files Attached Files

  4. #4
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Marcelo.
    What about using a Regexp expression to validate your data?
    Last edited by Mick_data; 02-13-2014 at 11:36 AM. Reason: Incorrect info given
    -- Mick --

  5. #5
    Join Date
    Feb 2014
    Posts
    3

    Default

    Mick,
    I'll give it a try, but the problem here is that the product does not work as it is stated in instructions manual. So okay, I can professionaly look for a workaround but the real problem here is that Pentaho functionality is unstable, completely unstable. Then I began to consider it is not a professional stable tool to build real professional complete ETL solutions. This kinds of lacks leaves Pentaho in the far corner of simple transformation like RAW tables loaders o in the end to populate dimensions. Here I can think that it worth to open a new thread like "Is it actually productive to use Pentaho for DI or is it better to have a stable layer well coded either in Java, or PHP, or Python or Perl and controlled by a well tested Jos Scheduler Environment???" To me, at least, after a week of testing it is a total waste of time.

    Marcelo

  6. #6
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Marcelo.
    I think that you're overreacting.
    Using a regexp is a functionality built in within the Validator step, so you don't need to change anything in your transformation, only a rule.
    If you think that there is a bug in a specific step, then it's fine. Obviously no software is free from any bugs, but the advantage of Pentaho is that you can report them http://jira.pentaho.com/ or even write your own patch.

    I'm sure that PDI has been used by many people to build
    real professional complete ETL solutions.
    I'll try to spend some time with your data and check why it
    does not work as it is stated in instructions manual
    .
    -- Mick --

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.