Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: numerous lookups on the same table

  1. #1
    Join Date
    Apr 2009
    Posts
    337

    Default numerous lookups on the same table

    Hi,

    I have this massive dimensions which has around 120 columns and fetches data for around 40 odd columns by looking up from the same table, say TABLE_CODES.

    now by passing a code key and value, this table returns the description which has to be loaded in the dimension..

    this lookup table is a static table and contains around 15k records, now I am finding that i have to use 40 lookup steps over the same table to fetch values for these columns... also the cache size becomes a prob that my cache multiplies by 40 * 1000 if i keep a cache limit of 1000!!

    Irrespective of the system memory, is there an alternate way to carry out this activity in pentaho?/ set an optimum cache?

    P.S.i understand the hardest way would be to write this all in a table input step to fetch values for these lookups
    Regards,
    Madhu

  2. #2
    Join Date
    Apr 2007
    Posts
    2,010

    Default

    how about a single table input step to read all 15k rows, but copying it's output to the 40 necessary stream lookup steps?
    Ah; I guess each stream lookup will still cache all 15k rows though
    ( I think i may have misunderstood your problem, perhaps paste a pic of the transform?)

  3. #3
    Join Date
    Apr 2009
    Posts
    337

    Default

    Quote Originally Posted by codek View Post
    Ah; I guess each stream lookup will still cache all 15k rows though
    ( I think i may have misunderstood your problem, perhaps paste a pic of the transform?)
    You havent misunderstood at all!! stream lookup was what i thought of too.. but it caches all 15k rows 40 times!!
    Regards,
    Madhu

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    So...

    Is the lookup table something like:
    ID | Code | Key | Value
    1 | 15 | Status | Red
    2 | 15 | Brand | MyOwnBrand
    3 | 16 | Status | Orange
    4 | 16 | Brand | SomeOtherBrand

    If it's not, then why not return all 40 columns in a single lookup?

    If you give us a very small sample of your TABLE_CODES (as a text file), along with your Transform, the community can do some really amazing things!

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The best way to solve this is, as gutlez says, to join the 2 data sets creating a Cartesian product. (Merge Join for example)
    For each input row you'll have say 1-N rows. Then you add a "Row de-normaliser" step to flatten the data back to one row resulting in the 40 columns you wanted.

    In any case, no caching is needed in this case. The join works even better if you can do it in the database.
    Last edited by MattCasters; 10-11-2011 at 06:33 AM.

  6. #6
    Join Date
    Apr 2009
    Posts
    337

    Default

    ok here goes the sample :

    code id | code value| description
    1| abc| I am madhu
    2| xyz| This is my sample


    so now in my dimension, in columnA i need "I am madhu" and in columnB i need "This is my sample"

    I am fetching the same descrption column from the lookup table, but the inputs to the lookup (for code id and value) will be different source columns, say A,B columns from source for one lookup and C,D columns from source for another...

    Regards,
    my signature got vanished!
    Last edited by madhupenta; 10-05-2011 at 01:27 PM.

  7. #7
    Join Date
    Apr 2009
    Posts
    337

    Default

    Hi Matt,

    I need to perform multiple(20) joins with the same lookup table then.. for eg: if you look at my sample, the lookup table columns are codeid and codeval and return from lookup is description... but i need to join this code id and code value multiple times with different columns from by stream/source!
    Regards,
    Madhu

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Again, there is no need to do 20 look-ups. You can do just one join with a single row de-normalisation step. It will be faster too.

  9. #9
    Join Date
    Apr 2009
    Posts
    337

    Default

    Ok, means i take the joining logic into SQL/merge join right? Thankx... will chk it out with the row normalizer/denormalizer!
    Regards,
    Madhu

  10. #10
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Sure, you can do the join in SQL and de-normalize using a Kettle step. It's easier to maintain the key (or code) to field mapping this way along with the data type conversions.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.