Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Data Profiling (DataCleaner) & Data Quality with Kettle

  1. #1
    Join Date
    Nov 1999
    Posts
    459

    Default Data Profiling (DataCleaner) & Data Quality with Kettle

    Data Profiling was already possible in an easy way with Kettle: Open the Database Explorer, chose a table and right click in the context menu on Data Profile. The result was basic information about the data like Min, Max, Count all for strings and additional information for numeric data but these were only basic metrics about your data. We have a much more better solution now:

    Human Inference (DataCleaner) and Pentaho (Kettle) worked together to integrate their tools and the result is a nice and seamless integration of DataCleaner into Kettle. A sample for introduction and FAQ can be found at "Kettle Data Profiling with DataCleaner": http://wiki.pentaho.com/display/EAI/...th+DataCleaner

    Additional, Data Quality steps are available for

    - Name Validation, Standardization and Cleansing
    - Address Validation, Standardization and Cleansing
    - E-Mail and Telephone Validation, Standardization and Cleansing
    - Duplicate Detection and Merge Duplicates

    The solution is available immediately and can be downloaded as a plug-in for the existing Pentaho Data Integration / Kettle releases 4.2.x and later,
    see: http://wiki.pentaho.com/display/EAI/Human+Inference

    We look forward to your experiences!

    Cheers,
    Jens

  2. #2

    Default

    Really nice. We already used Data Cleaner (though in an older version). The integration into PDI makes life easier.

    Cheers Max

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.