Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Database lookup step never completing

  1. #1
    Join Date
    Apr 2011
    Posts
    13

    Question Database lookup step never completing

    I've got a transformation that inputs records from XML, uses a lookup step to see if they already exist in table and if not then outputs them to the table. The lookup step and the table output step both point to the same table (MSSQL2008). I don't get any errors when running the transformation... the lookup step just never completes and all steps downstream of it also never complete. The other odd thing is that a varying number of records make it into the table before it "hangs" - meaning if I have my XML file has 20 records, I'll run the transform once and I get 7 records in the table; the next time I run it only 3 records make it and the next time 10. It's strange. The goal of the lookup logic is to prevent duplicate records from being committed no matter if they're in the same source XML file or not. If there's a better way to do this, I'm open to it. My basic transform is:
    • XML read
    • Sort on natural key
    • Pass unique on natural key
    • Lookup natural key in table1 - return key column as "match" field - default to 0
    • Case 1 from lookup step - If match field value = 0 then Send to table output step (meaning new record)
    • Case 2 from lookup step - Default step (all values except 0) Sent to dummy step (meaning existing record)
    • Table Output - committ new record to table1
    Thanks,
    -nth-

  2. #2
    Join Date
    Mar 2006
    Posts
    170

    Default

    Hi nth,

    You could possibly do:

    XML Read
    Insert/Update step to your Target Table

    If you set the PK correctly in the Insert/Update step it should insert records that don't exist and Not update records that do ... unless there is something in the XML version of that record that needs to be updated in the Target table. If that is the case you can compare more fields than just the PK of the Target table.

    -> also look at the "don't do any updates" setting within the Insert/Update step ... might also help you if you don't want to do any updates for existing records

    Doesn't sound like you need that however ... so just the XML input and Insert/Update step should work.

    BTW you will just cause yourself headaches and much stress trying to READ and WRITE to the same table in a DB within the same Transformation. If you MUST do something like that then you need to have that read and write logic split into 2 or more separate trans within a job.

    Good Luck

    Kent
    Last edited by kandrews; 12-01-2011 at 05:57 PM. Reason: more info

  3. #3
    Join Date
    Apr 2011
    Posts
    13

    Default

    Thanks Kent I'll give the insert/update a go. In the meantime I narrowed down my issue. If I redesign from scratch a transformation with the steps I outlined; it works. If I go into the table output step and check mark the "return autogenerated key" and put in the value and THEN try to run the transform again - the lookup step never completes. Interestingly if I uncheck the return autogenerated key and try the transform again, the lookup still never completes. I've run a textdiff on the Kettle files to see what's going on and have come up with the following scenarios:

    Scenario 1 - Lookup step works properly - Table Output return autokey not checked
    Relevant kettle file XML field values in Table Output step:
    <use_batch>Y</use_batch>
    <return_keys>N</return_keys>
    <return_field/>
    Scenario 2 - Lookup step never finishes - Table Output return autokey checked
    <use_batch>N</use_batch>
    <return_keys>Y</return_keys>
    <return_field>Table1_PKID</return_field>

    Scenario 3 - Lookup step never finishes - Table Output return autokey not checked (but was previously checked)
    <use_batch>N</use_batch>
    <return_keys>N</return_keys>
    <return_field>Table1_PKID</return_field>

    So the main issue appears to be something about turning off batch mode on the table output step affecting the lookup step upstream (which I'm not quite sure I understand). The secondary issue is that I missed the fact that I've got to turn back on batch mode when turning off the return auto key.


    Thanks,
    -nth-

  4. #4
    Join Date
    Apr 2011
    Posts
    13

    Default

    I ended up testing the Insert/Update and it worked well, so thanks for that.

    I also was able to get my original workflow to work by (in table output) setting batch mode to on or when batch mode is off; setting the table commit size to one. (This particular workflow won't be loading massive number of records at a time)

    I'm still not sure I understand why that table output batch mode setting would affect the upstream lookup step. Was this just a known consequence of running a lookup against the same table I was saving data to?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.