Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Can I compare .ktr file versions in source control?

  1. #1

    Default Can I compare .ktr file versions in source control?

    We are evaluating Pentaho Kettle for ETL project; one of the concerns we got from developers that its very hard to compare different versions of transformation (KTR) files through file compare tools? and they are saying that it will shift the contents in no specific ordering. So its very hard to compare differences from one version to another.

    Is that true? and is there any good approach to compare different versions.

    Best Regards,

  2. #2
    Join Date
    Aug 2011
    Posts
    360

    Default

    Hi,

    it is true that the order in which steps/hops are writter to a ktr files is quite random (it is written in the order as it is stored in internal Map/list, so basically in the order of creation or even move time!). The same is true for kjb files (jobs) with jobentries and job entry hops.

    However, for a given step/job entry, the metadata are always in same order in XML (this is kind of a serialization of the metadata of the step).
    The thing is, a KTR/KJB file is basically describing a graph, with steps as vertex and hops as edges, so the order is not relevant in the XML.
    Moreover, steps are identified by their name (must be unique in overall transformation) and this name is set by the developper while designing transformation, so IDs of steps can change from one version to another.

    So yes, it is difficult to compare different versions with file compare tools, since you need to compare graphs.
    And if a step name changes, you dont really have a mean to identify its previous name (maybe with smart rules based on the step type (a text file output) and its place in the graph ? you could then say: ok, both stepA in trans version1 and stepB in trans version2 are text file output, and both have only one hop going to an identified step in both trans versions, so they must be the same step).

    May be you could write a small java KTR/KJB compare tool, using the pentaho API, that will:
    1. reconciliate steps name+type between trans v1 and trans v2 --> identify which steps are the same name and type, which are new / deleted
    2. then reconciliate hops between trans v1 and trans v2 (since they connect names of steps!)
    3. finaly compare attributes of all identified steps --> a real diff of the xml of the step

    Then you could make your source controle tool to use this tool instead of file compare.

    Maybe with some XSLT you could generate some helper files to do use file compare on the helper files, like:
    - generate list of step name/step type, ordered by name
    - generate liste of hops ordered by name of source and target

  3. #3

    Default

    Thanks for the detailed answer it's definitely very helpful to understand how the process works... We can surely look for developing this type of comparison tool that what I was thinking. But you have laid out high level steps, that's great.

    but I am just having a feeling that this sort of the consideration should be added to the tool itself when generating the xml files, the overall intend should be to preserve the structure somehow and disturb it only when needed, always generate a predictable output. Would that be possible?

  4. #4
    Join Date
    Aug 2011
    Posts
    360

    Default

    Hi,

    I think it would be possible but you'll need to patch some code in the core pentaho api, but there is some problem:
    - serialisation of steps is handle in the code of each step code, they have to implement a getXML and a loadXML method (see StepMeta interface)
    However, nor.aly each step meta class extends the BaseStepMeta class, so you could tweek that one.
    - serialisation of transformation is handled in the TransMeta class
    - the big probleme is that steps dont have a technical ID, which could be time invariant, but are identified by their name.
    So in order to get always the same output, one should:
    - make the transmeta generate a UID for each steps when created, which will not change eith step name change, and which can be ordered
    and is strictly growing. Then add this UID to the step metadata, and in serialisation of transformation write the steps in UID order.
    - change the hops meta such that they refer to UID of steps and not to their names.
    Then write them in order of source UID and target UID (or creat an UID for hops too and use it for write order)

    So this is a big tweak on the meta data of PDI files, and you'll then need to handle old files.
    Kind of big work to do!

    So maybe first try with a custom comparator, and see if it meet your requirements.

  5. #5

    Default

    Hmm yes for now I think comparator would be logical choice but is there anyway to propose this change into product?

  6. #6
    Join Date
    Jan 2013
    Posts
    5

    Default

    Quote Originally Posted by tariqjawed83 View Post
    We are evaluating Pentaho Kettle for ETL project; one of the concerns we got from developers that its very hard to compare different versions of transformation (KTR) files through file compare tools? and they are saying that it will shift the contents in no specific ordering. So its very hard to compare differences from one version to another.

    Is that true? and is there any good approach to compare different versions.

    Best Regards,
    EDIT: I have now ported the command line tool to a web site which can easily be used without installing any software, both on desktop and mobile: https://difftr.oxplot.com/

    I wrote a command line tool to generate a graphical diff (as HTML output) for KTR files when I worked with Pentaho few years ago.

    See my blog post (which has link to the code as well).
    Last edited by oxplot; 07-06-2018 at 02:07 AM.

  7. #7
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Recommended reading: Managing XML in Git or Mercurial? Watch out for your merges - Joe Pairman 2016-11-26
    So long, and thanks for all the fish.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.