Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Which approach - any advice

  1. #1

    Lightbulb Which approach - any advice

    Hi I'm looking for some advice on the best way to manage a big project in Pentaho.

    We're working with a large number of different data streams (over 400) that are published as CSV files online. We have no ownership over the publication procedures, so we often get inconsistent data, e.g. columns renamed, new rows added to the header, different date formats.

    The data is transaction data, so there are no more than 12 columns in the most complex files and often only four or five columns in each file. Obviously there is a lot of data commonality here, with each file containing at least the following columns: buyer, supplier, value and date.

    It seems that I can take one of two options to handle this:

    1. Create a transformation for each data stream and maintain / update those as each data file changes.

    2. Create a form of look up table that indicates what data is in which column and in what format the data is in, and then create a transformation for each single format variation. (e.g. Org 1 = date(mm/dd/yyyy), value($0.00), buyer(varchar255), supplier(varchar255).

    The second option looks the most efficient in the long-term, but it seems complex to develop and might stretch my meagre skills. If the best option is the second one, what would be the best way to implement this option.

    Any advice?


  2. #2
    Join Date
    Apr 2008


    Have you checked the metadata injection step?

    I think that it could help you regarding all those fields inconsistencies.
    If you need only few fields, you can use a "select value" step to keep the ones that you need.


  3. #3


    Brilliant Thank you.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.