Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Read and write to lookup dataset

  1. #1

    Default Read and write to lookup dataset

    I am new to Pentaho, though not new to ETL, and looking to design a process that both reads from lookups and writes back to those same lookup sources....and in a efficient manor. I have taken a look at using the "Database lookup" and/or "Database join" stages, though since this is a large dataset that I will be processing I do not want to perform a query to the database for each row processed as that is very expensive. I know I can tell the Database lookup to cache data and even cache a whole table but I would be updating that same table as the rows are processed and would need to be able to update the cache...which to my understanding is not possible. Any thoughts on how I might be able to attack this type of problem without querying the DB for each row I process?

    I also should note that I need to be able to process these records row by row to make sure that each row fully commits to the dataset before the next row attempts a lookup. I believe I can use the single threader to get this desired functionality. Is this assumption correct?

    Thanks in advance...
    Pentaho Version:

  2. #2


    Dear Simon,

    I also a beginer (but you aren't since dec. 2011) but must to picj up my knowledge asap.
    My actual task is load HL7 healthcare data to different tables based on segment types, but HL7 and healthcare communication stores all of data the same person (PID Segment) and I must disallow duplicating Patient data in the destination table.

    Yesterday I tried to use "Database lookup" and "Diynaic SQL row" elements but they didn't work for me.

    If you have time and like to help me send me some instruction about using these two tools, please.

    Török László

  3. #3
    Join Date
    Nov 1999


    Török, try the "combination lookup/update" step if you're populating a table with a new primary key.

    Simon, it's usually more efficient to join data not row by row (DB Lookup, DB Join) but all rows at once with for example the "Merge Join" step. That way you have good performance and zero-memory usage. Just make sure to sort the data on the database prior to joining.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.