Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Balanced Line Compare

  1. #1

    Question Balanced Line Compare

    Is the "Balanced Line Compare (-mode)" by comparing of two sorted files (datasets) popular?
    Before comparing, the files are sorted by the compare-key. As a compare-result for 2 rows (records) you have 3 cases: keys identical, or key file A < key file B (record only in A) or key file A > key file B (record only in B)….
    I thing the term is from cobol and mainframe world.
    Are there Kettle element for that?
    Thank you,
    Peter

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Merge rows step in the join section woulld be close to what you want.

    Regards,
    Sven

  3. #3
    DEinspanjer Guest

    Default

    No, it doesn't currently exist in Kettle. You would have to implement it in a plug-in step.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Daniel, what would be the difference between the "Merge rows" step and the requested feature?

  5. #5
    DEinspanjer Guest

    Default

    The way I understood what he was asking for at first is sorta a hybrid between a sort and a compare... Basically, take two ordered streams that have to both have exactly the same number of rows (let's say 10 rows) then compare their defined keys:

    1 <=> 1 = A == B
    2 <=> 2 = A == B
    3 <=> 2 = A > B
    3 <=> 3 = A == B
    3 <=> 4 = A < B
    5 <=> 6 = A < B
    7 <=> 7 = A == B
    9 <=> 8 = A > B
    9 <=> 9 = A == B
    10 <=> 10 = A == B


    Going back and rereading the part he put in parenthesis about "(record only in A) ... (record only in B)" maybe he really does just want the Merge step which would indicate new vs deleted in those cases.

    We'll have to wait and see what he says.

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Exactly my thoughts :-)

    FWIW kettle gives back 4 states: new, deleted, changed, identical

    Matt

  7. #7

    Default

    Hello,

    Thank you for the answers! They are very useful. (Every advice will be good for me, because I am quite new in kettle.)

    "You would have to implement it in a plug-in step." - Could you be more specific? You mean: For example I need this kind of compare with additional logic considering how to handle with the result; So I have to write a java class compiled and packed in a jar- file?; then this jar- file I have to place in \plugins\steps together with plugin.xml (written by me too) and *.png- File?? Is something like that? Then my class will be a Sub- class of ?? So can I use the javadoc and source of kettle? Where can I find the javadoc and the source- files? Thank you very match in advance!

    Otherwise,
    the result after compare with merge rows step seems to be exact. Apparently merge rows step works using ‘balanced line mode’ (because e.g. if one of the files isn’t sorted, then the result is not very reasonable). Here is my simple example:

    File A: File A:
    a1 a1
    a2 a3
    a3 a4
    a5 a5
    a6 a7
    a8
    a9

    In merge rows step is File A as the origin file defined. Comparing according ‘balanced line mode’ means:
    1) “Read the first row from file A and from file B è a1<==> a1
    2) If both keys are identical (in our case yes), then read the next row from file A and respectively the next from file B; è a2<==> a3
    3) if key (file A) < key (file B) (in our case yes, a2<a3) then read the next row from file A and compare again; (result: a2 is only in file A; merge rows step says: "deleted", it is ok, because file A as origin)
    4) in our case yes identical (a3=a3), therefore read the next row from file A and respectively the next from file B; è a5<==> a4 and compare
    5) if key (file A) > key (file B) (in our case yes, a5>a4) then read the next row from file B and compare again; (result: a4 is only in file B; merge rows step says: "new", it is ok, because file A as origin)
    6) in our case yes identical (a5=a5), therefore read the next row from file A and respectively the next from file B; è a6<==> a7 and compare….
    …..

    Therefore the result of merge rows step is OK:
    Field1;flagfield
    a1;identical
    a2;deleted
    a3;identical
    a4;new
    a5;identical
    a6;deleted
    a7;new
    a8;new
    a9;new

    How I use this result is another topic- I guess whit filter …?

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Filter Rows can be used to split them or the new Switch/Case step in 3.1.0.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.