Hitachi Vantara Pentaho Community Forums
Results 1 to 16 of 16

Thread: Inconsistent Results: DB Lookup vs. Execute SQL steps

  1. #1

    Exclamation Inconsistent Results: DB Lookup vs. Execute SQL steps

    Greetings!

    I have written a transformation that looks up surrogate key values in a "master table" based on the contents of a temp table.

    1) using the Database Look-up step, my transformation outputs 3 rows. The debug output shows the SQL query executed:
    Code:
    SELECT SK_ID, ACCOUNT_NUMBER FROM master_table WHERE CARDID = ?
    2) using the Execute SQL step, my transformation outputs 4 rows. The debug output shows the SQL query executed:
    Code:
    INSERT INTO target_table ((
    SELECT  SK_ID , ACCOUNT_NUMBER,
    FROM master_table WHERE CARDID in (select CARDID from temp_table)
    ))
    The only difference that I can see between both steps is that 1) using the EQUAL operator whereas 2) allows me to use the IN operator.

    I would rather use the DB Lookup step as the Execute SQL step is *very* slow.

    For the life of me, I cannot figure out why I get different results, where is my transformation erroneous? Any ideas?

    Thanks,

    Al.

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    What is the input to the database lookup step?

    Are you doing a Table Input -> Table Input type transform?

  3. #3

    Default

    What I am doing is this:

    Table input (select ...from temp_table) ->
    Database lookup (select ...from master_table->
    Table Output (Insert into target_table)

    Makes sense?

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    I suspect nulls .... we'd have to see your data for further comments.

    Regards,
    Sven

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    "Execute SQL step is *very* slow."
    That is by design. It was designed to executes DDL, not queries.

  6. #6

    Post

    1) Matt => I was not complaining about the slowness. I simply made a statement of fact.

    2) Sboden => no null in data.

    a) data in `temp_table`
    Code:
    CARDID 
    2161887621
    1912528427
    2185314081
    2370921955
    b) data in `master_table`
    Code:
    SK_ID    ACCOUNT_NUMBER
    1    2161887621
    2    1912528427
    3    2185314081
    4    2370921955
    5    151374963
    Any suggestion welcome....

  7. #7
    Join Date
    May 2006
    Posts
    4,882

    Default

    Give us the data in your tables again, but now add explicit begin- and end-characters in it... something as "select '[' || CARD_ID || ']' ... "

    Regards,
    Sven

  8. #8

    Default

    a) data in `temp_table`
    [2161887621]
    [1912528427]
    [2185314081]
    [2370921955]
    [2370921955]

    b) data in `master` table
    [1] [2161887621]
    [2] [1912528427]
    [3] [2185314081]
    [4] [2370921955]
    [5] [151374963]


    Al.

  9. #9
    Join Date
    May 2006
    Posts
    4,882

    Default

    mmmm ... you data doesn't match anymore with the original example... now you have duplicates in your temp table which does impact the number of rows you get, but the reverse from your example... the "insert select" should give you less rows.

    For your query:
    INSERT INTO target_table ((
    SELECT SK_ID , ACCOUNT_NUMBER,
    FROM master_table WHERE CARDID in (select CARDID from temp_table)
    ))
    - , after ACCOUNT_NUMBER in the select (probably from making the query smaller)
    - CARDID in the where clause is wrong, it should be account_number in your example.

    I tried your example in Oracle and both database join and your insert-select work as I expect... with duplicates in the temp table the insert-select returns less rows as the in filters out duplicates.

    Regards,
    Sven

  10. #10
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Could you also post the layout of your tables?

    The tables as you have already posted appear to be:

    Temp_Table
    CARDID

    Master_Table
    SK_ID
    Account_Number

    But your queries don't match that.
    SELECT SK_ID, ACCOUNT_NUMBER FROM master_table WHERE CARDID = ?
    The master_table (as described) doesn't have CARDID, which shouldn't work at all!


    I would agree with Sven - I can see how if you have a CARDID in the Temp_Table more than once, you would get MORE rows back than in the insert(select()) statement, but not less.

  11. #11

    Default

    My mistake...I failed to give you the complete layout (got a bald cold I cant't seem to shake...thinking is impaired). Here goes:

    1) TEMP TABLE (used for daily dump of `facts` from OLTP source)

    Name Null? Type
    ----------------------------------------- -------- ----------------------------
    [CARDID][NUMBER]
    [NUMBEROFRESETS][NUMBER]
    ... ...
    [LOAD_DATE]][DATE]

    1.a) data in `temp` table
    [2161887621][1069379]
    [1912528427][72797]
    [2185314081][94668]
    [2370921955][23216]
    [2370921955][922]

    2) MASTER TABLE (this is a `dimension` table)

    Name Null? Type
    ----------------------------------------- -------- ----------------------------
    [SK_ID][NUMBER]
    [CARD_ID][NUMBER]
    [ACCOUNT_NUMBER][NUMBER]
    ... ...

    2.a) data in `master` table`

    [1] [ 2161887621] [200080]
    [2] [1912528427] [200347]
    [3] [2185314081] [200583]
    [4] [2370921955] [200959]
    [5] [151374963] [200976]


    Now you have the complete picture. Sorry about before.

    Thanks for your input.

    Al.

  12. #12
    Join Date
    May 2006
    Posts
    4,882

    Default

    Is the space in front CARD_ID of line [1] in the master table intentional or a typo?

    Regards,
    Sven

  13. #13

    Default

    It is a typo.

  14. Default Which step drops the data?

    Quote Originally Posted by acbonnemaison View Post
    What I am doing is this:

    Table input (select ...from temp_table) ->
    Database lookup (select ...from master_table->
    Table Output (Insert into target_table)

    Makes sense?
    Do you know which step drops the data? Does the TableInput read only three rows?
    Does the TableOutput write only three rows?

    P.S.: While the Execute SQL might be slow. I think it should be faster than any directly executed SQL statement, given large enough numbers (as the database can optimize and do all things local). But then for large numbers your IN set gets pretty large, so it is probably swapped to file system buffers, etc. I'd guess your percived performance issue is on of the database, its configuration and hardware.

  15. #15

    Default

    Quote Originally Posted by PlanBForOpenOffice View Post
    Do you know which step drops the data? Does the TableInput read only three rows?
    Does the TableOutput write only three rows?
    Table input reads 5 rows. Table output writes only 3 rows.

    It seems that, by looking at the debug log, that records get dropped by the database lookup step.

    What I cannot figure out is why the Execute SQL step is not?

  16. #16
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Database lookup doesn't drop rows in the default configuration.
    As mentioned, "Exec SQL" is certainly not designed to do database lookups so that discussion ends there.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.