Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: How optimize lookup step against massive DB

  1. #1

    Angry How optimize lookup step against massive DB

    Dear all,

    I have a very big table where there are milions of personal master data stored with about 25 cols.
    I have only a key code (fiscal personal code) to check against this table just to find addresses and other data.
    I've tried on a standard way of lookup step but Kettle after a while goes into panic.
    I'm not a programmer so please explain me the best way to do this in a easy way or better if you have a simple example please attach it.

    I use Prosapconn and the big database is a DB2 on AS400. Kettle version is 2.5.2. becouse of Prosapconn license compliance.

    Many thanks
    Giovanni

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    When you have 2 big data sets that you need to join, it's usually better to use a "Merge Join" step to do this work.

    Please note that your SAP connector is also available for versions >= 3.0

  3. #3

    Default Thanks

    Hi Matt,

    thanks for your reply. Could you please say me a little bit more becouse I used merge join a lot of time ago.
    Have you gently a short sample...

    The license for updating to 3.0 is quite expensive and customers doesn't like it.

    All the best and thanks again
    Giovanni

  4. #4

    Default Attachment

    In "sort rows" step and "select values" step the system go very very slow and seems blocked.

    Record in ANANGASL13 = 800.000
    Attached Files Attached Files

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I don't use the plugin myself so I can't open your transformation.
    That being said, you always have to sort on the database if you can.

    Anyway, "a few milion" rows isn't all that much. You can try the Merge Join OR loading data in memory with "Stream Lookup".

    Locking yourself up in 2.5.x is not really a good idea though. 3.x is several times faster all by itself without changing anything.

  6. #6

    Default

    sorting the SAP-Output is pure horror ....

    Do I understand the transformation correct?
    For every entry in ANANGASL13 you want to lookup fields from ZMM_TRAC_FARMACO?

    then merge-join is not your friend :-)

    the PROSAPCONN-SAP-Step accepts input which
    you can use in in the SAP Step.
    i don't have your SAP-System avail but if you check my modification,
    you may get the point.
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.