Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Use cases of Meta Data Injection

  1. #1
    Join Date
    Nov 1999
    Posts
    459

    Default Use cases of Meta Data Injection

    We got more and more requests of adding meta data injection functionality into further steps.

    I would be interested in the use cases that drive this functionality, so we could document this feature better and enhance this feature more.

    Looking forward to your feedback on this thread!

    Thanks a lot in advance,
    Jens

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Jens,

    The one time that I needed to use it was for the following:

    • Loading a DB with unknown number of extracts


    1. Incoming tables are in CSV with an accompanying Table definition, but not a DDL.
    2. Tables need to be created based on the definition, and then the data loaded from the CSV into those tables


    Since there were an unknown number of tables, and the column names were not known in advance (basically this was to build a scratch DB on the fly based on a set of extracts)

    The steps that currently support Meta-Injection were sufficient (for the most part), and where they weren't, variables finished the job

  3. #3
    Join Date
    Nov 2011
    Posts
    18

    Default

    I have 2 use cases that I actively use Metadata Injection for today, but there are a lot of other use cases that I sometimes wonder if Metadata Injection would solve.

    The 2 use cases that I use today are:

    1. Loading a large number of CSV files to a staging database. The CSV files had header rows that matched the database field names exactly. Therefore we were able to use Metadata Injection to load these files rather than having to write 50 transformations. (We did use a User Defined Java Class to load the database since Metadata Injection for steps that output, update, insert to a table is still very very limited. As of 5.1 only the Table Output step.)

    2. Extracting a large number of database tables to JSON files that can be loaded into and processed in Hadoop after writing a few Metadata Injection patches. We used the JDBC metadata plugin to get the database table metadata and then Metadata Injection to automatically create the transformations to do this. There were two huge benefits of this. First was it insulated us from any database table structure changes as any new columns will automatically be added to the extracts without us having to modify code. Second, is rather than having to do 2 weeks of mind numbing work to build each transformation Metadata Injection did it for us.

    As far as other use cases where I would like to consider using Metadata Injection:

    3. The first use case above also plays with the I receive data from multiple different sources in multiple different formats all containing the same type of data. With Metadata Injection rather than having a separate transformation for every format that has to be maintained, a simple Excel file mapping the input fields to the table fields would be better.

    4. The second use case above plays with the I need to send exports to multiple different vendors in multiple different formats all containing the same data. Again an Excel file with metadata injection would be better than a separate transformation for every export.

    And then some really thinking out of the box stretch use cases.

    5. I could see eventually using Metadata Injection with Visual MapReduce to maintain the input and output record structures for the mappers and reducers. In Visual MapReduce the key and value from the mapper are usually a string of multiple fields concatenated together. Then in the reducer the first thing you do is split those apart, so rather than having to manually maintain those when you make a change could Metadata Injection solve that?

    6. Could you use Metadata Injection to read a Mondrian schema or maybe even MDX logs and dynamically generate aggregate tables for Mondrian?

    7. Often building the transformations to build the dimension tables for your star schema are virtually identical only changing the source table, target table, and fields that are mapped. Would metadata injection with an Excel sheet describing the mapping be a better solution.

    I could probably come up with a lot more, but I think that is enough for now.

    Beyond those detailed use cases there are 2 general ways Metadata Injection can be used. First, is to dynamically configure the transformation at run time. Second, is ktr generation getting the boring details of developing a bunch of transformations done quickly, while still allowing you to go in and do more complex development editing those transformations.

    Both of these high level use cases can be significantly hampered by not having an option to set parameters in Metadata Injection (http://jira.pentaho.com/browse/PDI-7224) and also a lack of step support, primarily output steps today.

    The ktr generation use case for metadata injection requires even more step support than the dynamic execution because parameters are not practical in this case. Passing something as a parameter to a ktr that is being generated using metadata injection does no good since the parameter is not saved anywhere in the generated ktr. Therefore things that you might be able to do using parameters for dynamic execution cannot be done. The other challenge with using Metadata Injection for code generation is it does not change the name of the generated ktr. Therefore if you use Metadata Injection for ktr generation and your template transform is named template every generated ktr is also named template making them hard to work with in Spoon.

  4. #4
    Join Date
    Aug 2009
    Posts
    3

    Default

    A while back I had to read some 600+ files in a variety of formats. Not all files had all columns present, part of the information was in the headers and not as a data column (e.g., "Sales (EUR)" instead of a currency column and a sales column), some files had typos in their names, some files had different date formats, etc. You think of a format discrepancy and some files had it.

    Plus, I had no control over the file format or any indication about which format each file used without reading it first.

    So metadata injection was the obvious choice, inspecting the header row of each file and working my way through the various oddities. The algorithm ends up being more complex that absolutely necessary because a lot of useful steps don't support metadata injection and we're left with a very limited set of options.

    Cheers,
    Nelson

  5. #5
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi,
    we had very similar case recently.
    I had to use 100+ csv file, add some data using a Stream Lookup, then aggregate them with the Group by and finally save them as text files.
    Most of it is already doable using metadata injection, not Text file Output (as far as I could see).

    BTW: some more examples on how to deploy it would be useful!
    -- Mick --

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.