Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Execute SQL Script challenge (quote strings vs. not quote strings?)

  1. #1
    Join Date
    Sep 2014
    Posts
    175

    Default Execute SQL Script challenge (quote strings vs. not quote strings?)

    Hey I'm playing around with Execute SQL Script as opposed to Table Output/ Update --- because it seems to be faster (6 seconds for 1000 rows vs. 9 seconds).

    Probably not super important long term.

    I'm taking a JSON document, parsing it, and sending it to the SQL database.

    In Table Output this works just fine. The JSON inputs values like 1, 2, , dog .... notice that the 3rd value is empty.

    When an empty value is inserted using the Table Output step, it's simply converted as NULL in the database. How exactly this happens, I'm not sure.


    Now .... let's say I'm doing the same thing in Execute SQL Script using insert into table (a,b,c) values (?,?,?) using the fields as parameters.

    Suddenly there is a problem.

    That's because the fields are 1, 2, , dog, as well as the following:
    {3,faler??ca,,} aka random garbage with a lot commas and other SQL-breaking syntax.

    The clear solution? Quote strings!

    The {3,faler??ca,,} garbage becomes '{3,faler??ca,,}' and no longer breaks SQL - it becomes inserted as the string it is.

    HOWEVER ... now the empty values () ... become "". What's the problem with that? Well, it turns out an empty value becomes NULL ... but a quoted empty value becomes an "invalid integer." It must be unable to convert into an integer column.

    I'm guessing the only way around this is to quote everything except "empties" between the JSON parsing and the SQL script run. Much of a pain though, maybe I'll accept a 50% process overhead instead.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I found run-time efficiency to depend very much on the Table-Output setting for "Commit size".
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Sep 2014
    Posts
    175

    Default

    Quote Originally Posted by marabu View Post
    I found run-time efficiency to depend very much on the Table-Output setting for "Commit size".

    Right -- I definitely have that piece handled. I have 1000 rows being added, so I tried both 1000 commit size and 1000/n (where n = number of instances of the step running) --- and the time is about the same.

    For some reason the Execute SQL Script is faster (I think it executes only one statement if you check a certain box).



    This may be because of the exact syntax used. In Execute SQL Script, I can use "Insert Into .... On Duplicate Key Update" .... whereas the Insert/ Update Table Output step (or the Table Output, on Error Update Steps) .... are both slower speeds, because the exact Pentaho SQL commands may be less optimal (I don't know what they are).

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You can turn on logging on the server side to learn about the Kettle SQL statements.
    In your own SQL statement you might find NULLIF() helpful.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    When setting up a "Table Output; On Error Update" transform (Using steps, not SQL Execute), be sure to turn off the lookup in the Update step. Otherwise you'll burn a DB roundtrip confirming that the row is still there.

    Also, make sure you use the Update step, not the Insert/Update

    Insert/Update will use (at least) two round-trips to the DB per row.
    First one looks up the row to see if it's there. Then depending on if the row exists or not, it will fire either an update or an insert.

  6. #6
    Join Date
    Sep 2014
    Posts
    175

    Default

    Quote Originally Posted by gutlez View Post
    When setting up a "Table Output; On Error Update" transform (Using steps, not SQL Execute), be sure to turn off the lookup in the Update step. Otherwise you'll burn a DB roundtrip confirming that the row is still there.

    Also, make sure you use the Update step, not the Insert/Update

    Insert/Update will use (at least) two round-trips to the DB per row.
    First one looks up the row to see if it's there. Then depending on if the row exists or not, it will fire either an update or an insert.
    Thanks for the advice--

    Yeah I did turn off the lookup, and I have used both (Insert/ Update) by itself (multiple copies) .... as well as the Table Output, on Error (Only Update) (multiple copies). Execute SQL statement is still faster.

    I think I'll try manually putting the quotes around each '?' parameter that needs it, and none around the ones that don't --- that may be the ultimate solve. On the other hand, it may be more maintainable/ clearer to document if I just use a simple Insert/ Update step in Pentaho.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.