Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Data Lineage - Via Autodoc step?

  1. #1
    Join Date
    Apr 2007
    Posts
    2,010

    Default Data Lineage - Via Autodoc step?

    Hi Matt,

    We spoke briefly about doing lineage via the autodoc step.

    You mentioned that this step exposes the transMeta object. The implication was (Or so I thought!) that this step would output this information for later processing?

    However; It doesnt seem to. The only output from the step in spoon is the filename. It consumes the filetype.

    Looking at the code i do see stuff that does seem to imply it is doing this:

    outputRow[outputIndex++] = transMeta;

    But it doesnt appear when you look at the output fields from the step.

    So am I missing something here?

    Thanks,
    Dan

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    As shown in the auto-doc example you can get a hold of the TransMeta or JobMeta object in a JavaScript step.
    Now obviously with those objects you can call for example transMeta.getSteps() to get the steps list, you can see the databases used, ...
    You can also call transMeta.analyseImpact() as described in Pentaho Kettle Solutions.

    All in all you can extract any information you like this way.

    The big difference with the method described in PKS is that the new step also supports loading metadata from a repository.

  3. #3
    Join Date
    Apr 2007
    Posts
    2,010

    Default

    Perfect, thanks Matt- foolishly i didnt look at the autodoc example!

  4. #4
    Join Date
    Apr 2007
    Posts
    2,010

    Default

    Mwhahahaha, this works great.
    I can now access all the steps, and all their output fields from this code.
    so that opens up the possibility of implementing a process to map from source to target. (Which sure, is going to be hard in some cases, but simple in many)

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    A transformation doesn't really have a start and end, it's a network. So I think the trick is to limit yourself to actual use-cases. If you dump all the source and target step field combinations you end up with millions of combinations: a lot of data, very little information.

    For database impact analyses we created a separate API so that the steps give feedback on which tables/columns they read/write/update from which database.

  6. #6
    Join Date
    May 2011
    Posts
    6

    Default

    I'm interested in learning more about this separate API for 'database impact'. Does this also work for 'KTR impact'? As in "What KTR's utilize a specific view/table?" Is this metadata dependent on using a Pentaho repository? Can this be done more easily using an existing SVN Repository?
    Benjamin Simmons
    Consert - San Antonio
    bsimmons@consert.com
    benjamin.simmons@juno.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.