Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: Comparative approach between PDI and Talend on an ETL sample

  1. #1
    Join Date
    Mar 2007
    Posts
    216

    Smile Comparative approach between PDI and Talend on an ETL sample

    Hi,

    I am a PDI user but recently have not been able to do one of my transformations with PDI, then tried Talend and then have been able to do it.
    I first describe the transformation to do.
    Then I describe my approach in PDI and my approach in Talend.
    They are different approach which I would like to speak about.

    The transformation to do :
    -Input table
    Code:
    id        c01       c02       c03       v01       v02       v03
    1001      1          2          3          a          b          c
    1002      2          3          1          d          e          f
    1003      3          2          1          g          h          i
    -Information table
    Code:
    code label
    1    colour
    2    size
    3    price
    -Output table (my goal)
    Code:
    id   colour size price
    1001 a      b     c
    1002 e      f     d
    1003 i      h     g
    My approach with PDI is to chain <<Filter Rows>>and <<Select Value>> steps :


    My approach with Talend is to write the conditional Expression to evaluate, manually, for each output field :



    Some remarks :
    Code:
                                                       PDI        Talend        Winner
    Count of conditional expressions                   9              6        Talend
    Count of (C01,C02,C03) possible combinations       3             all       Talend
    -I have not been able to use variables, neither in PDI nor in Talend, maybe they are too complex to use (at least for myself)
    -I have been able to use the second approach with 15 c0x and 7 output fields, just using a notepad's "search and replace" function.
    This would have need 100+ steps with my PDI approach.

    My PDI approach seems very bad.
    All remarks welcome.

    a+,=)
    -=Clément=-

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I think this is a simple normalization exercise. I doubt you would even need a filter to do it.
    Let me make a small "Normaliser" example...

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Thumbs up

    OK, here it is:

    Name:  decoding-sample.png
Views: 180
Size:  5.5 KB

    The transformation + your sample data: sample.zip

    The interesting part is that you can indeed use variable for the key values.
    However, I'm against using parameters for the target field names in the "Denormaliser" step. You see that would make the outcome of this transformation non-deterministic.
    In other words: it would be very hard or impossible to tell in advance what the row layout would be coming out of said step in advance.

    Then again, I'm sure it will be very easy to manage these 2 steps ;-)

    Matt
    Last edited by MattCasters; 10-17-2007 at 12:10 PM.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Oh Clément, don't forget to inform us about who won in the second round :-)

  5. #5

    Default

    And the winner is .. :-)

    Samatar

  6. #6
    Join Date
    Mar 2007
    Posts
    216

    Thumbs up PDI wins !

    Hi,

    Code:
                                                       PDI        Talend        Winner
    Count of conditional expressions                   0              6        Kettle
    Count of (C01,C02,C03) possible combinations       all           all       Draw
    [out of subject]As you noticed, I already wrote on the forum about a related issue ( http://forums.pentaho.org/showpost.p...69&postcount=9 ). At this time I conceived (on paper) a transformation with two Normaliser steps chained (normalising code then value), then a Filter step (<<type1==type2>> to get the same ouput as your single Normaliser) and a Denormaliser step. I studied the Normaliser step sample, I studied the Denormaliser step sample, but I missed a "Normaliser-Denormaliser" sample (even if the "2 series of keyvalue pairs" is not far from this). That's why I finally ended with a basic (and unusable) "If Then Else" algorithm. [/out of subject]

    Thanks a lot Matt, as it's no way using multiple tools, it's really helping me.

    a+, =)
    -=Clément=-

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Glad I could help. As always, the devil is in the details. A lot of things can be done in both tools.
    It's the way that problems are solved that makes the difference in the end.

    You can see by looking at the Denormaliser step that a lot of effort has been put into the step over time.
    When I was working with Oracle Warehouse builder we had an "Un-pivot" operator, but you still needed 3 or more operator to change data types, do expressions, add constants, do expressions etc to get at the same result. And then it was still limited by 32 fields. The performance of the Denormaliser step is the same regardless of the number of fields you add.
    I don't know how Talend does it, but without some sort of "Un-pivot" or "Denormaliser" you just can't get a decent result without severe performance loss and a lot of coding work. (as you found out I guess ;-))

    Both steps in the sample run in-memory using hash-tables. Performance should be close to ... "optimal".

    Matt

  8. #8

    Default

    Matt, why not putting this one in the samples folder :-)

    Samatar

  9. #9
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Why what a great idea Samatar!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.