Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Data Audit (File Content)

  1. #1
    Join Date
    Apr 2008
    Posts
    1,771

    Unhappy Data Audit (File Content)

    Hi.
    Very simple question.

    Using Kettle, is it possible (at any step) to produce a data audit of a field content?

    Example:
    I have a file with a field called Gender.
    I import the file into Kettle, then I remove records based on a different field (date of birth).

    Can I produce a table which tells me how many males/females there are in the original file and how many after I removed those records?

    Can I do it with Kettle/Spoon or do I need some other products (such as Aggregation Designer) and which one?

    Note that I need a very simple report, not complex chart.

    Thanks.
    Mick.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Simple answer... no, it's not possible.

    Regards,
    Sven

  3. #3
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Sven.
    Is there any plan to add the possibility to do a quick parse through a data file (ot database table) and create some frequencies (or statistics if there are numeric fields)?

    If not, is there a step that can do something similar to a "distinct" (and count) function in SQL, so that I could create a file which would contains the information that I am looking for?

    Referring to my previous example, if I use:
    select distinct gender, count(*) as frequency
    from mytable
    group by gender

    I would obtain:
    Male,245
    Female,146
    Unknown,20

    If such a step does not exist, where should I start reading for examples so that I can create one?

    Thanks a lot.
    Michele

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Yes, we should have that available by Q1 2009 as part of the data profiling software we're currently writing.

  5. #5
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Yes, we should have that available by Q1 2009 as part of the data profiling software we're currently writing.
    That's a great news!

    Any beta or alpha version of the software that I can download and try?

    Michele

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Ask in a month, we're prototyping / doing the architecture now.
    All in all it's pretty simple technology wise as it's going to use Kettle transformations at the backbone (in-line ETL) but I would rather get things to a usable state before I splash it around. My personal feeling is that this is better for projects that don't have that much interest from the community.

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.