Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Batch Insert/Update from Spoon v4 to MySQL

  1. #1
    Join Date
    Aug 2010
    Posts
    6

    Default Batch Insert/Update from Spoon v4 to MySQL

    I'm trying to accomplish something pretty simple: (1)I have a table input, then (2)"add value fields changing sequence" transformation, (3)a formula to append the sequence number returned from previous transformation to one of the fields from the input table, and finally (4) insert/update the same table, with update only, using PK.

    It doesn't look to me Spoon batches up the update statements.

    Is there a param somewhere in any transformation (couldn't find one in insert/update) that allows batching up/rewriting the statement, or I need to modify the JDBC connection string from Spoon using the rewrite option to have the driver do it?

    Thanks.

  2. #2
    Join Date
    Jun 2007
    Posts
    233

    Default

    Hi griffoun,

    I am not sure exactly what you are saying, but if I understand you correctly you may have approached your transformation with more complexity than it needs as well as missing some of the options you can set to optimise performance. To answer your batch insert / update question directly you can change the commit size setting in the transformation step. This is akin to setting your batch size for updates and inserts.

    If I understand your transformation correctly you are reading data from a table / view then checking some fields for different values to add a corresponding sequence number, then concatenating the sequence number to one of your fields that you already have, then using insert / update to push the modified data to the DB (presumably the same table). If this is the case you are not generating new rows and might be better served by using simply the update step and trying different commit sizes for the batching. If you are pushing all the data 'new' to a new table / empty table you might want to try the MxSQL Bulk Loader (still considered experimental).

    If you want, you can post your transformation in this thread and we can take a look at it.

    Cheers

    The Frog
    Everything should be made as simple as possible, but not simpler - Albert Einstein

  3. #3
    Join Date
    Aug 2010
    Posts
    6

    Default

    The difference between commit size and batch size, to my understanding:

    A process can send, say 100 INSERT statement, each with 1 row. So there're 100 "batches", each batch with 1 row. The ETL then request commiting the new rows.

    What I'm looking for, is to have Spoon or JDBC to rewrite the Insert, so that there'll only be 1 insert for 100 rows.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I implemented it for the Update step recently, it shouldn't be too hard to do for the "Insert/Update" step. I'm not sure the performance is going to be all that affected though or near impossible on certain RDBMS. An alternate strategy for doing bulk inserts updates might work better for you. (for example join the data streams to do bulk lookups, then pass the data streams to table output and update steps).

  5. #5
    Join Date
    Nov 2008
    Posts
    143

    Default

    Quote Originally Posted by griffoun View Post
    ...(4) insert/update the same table, with update only, using PK...
    How is it that you're updating only in a insert/update step? Is it not "inserting only"?
    If it is indeed inserting only, then you can change your logic and replace the insert/update by a triad with database lookup, filter rows and table output.

    If you only mean to update, then I guess you'll have to file a jira issue for Matt's suggestion and wait for it to come out, or use the same triad idea, replacing the table output by an update step.

    HTH,

    Renato

  6. #6
    Join Date
    Aug 2010
    Posts
    6

    Default

    Instead of showing the steps, let me illustrate what I'd like to do:

    I have following columns in a table:
    Seq - Sequence number, PK
    Col1 - String, populated
    Col2 - String, blank

    Say my data look like this to begin with:
    1 MyStr1 <blank>
    2 MyStr1 <blank>
    3 MyStr1 <blank>
    4 MyStr2 <blank>
    5 MyStr2 <blank>
    6 MyStr3 <blank>

    Here's how I want it to look like after a process is run:
    1 MyStr1 MyStr1-1
    2 MyStr1 MyStr1-2
    3 MyStr1 MyStr1-3
    4 MyStr2 MyStr2-1
    5 MyStr2 MyStr2-2
    6 MyStr3 MyStr3-1

    Feel free to suggest any transformation you may have to make it simpler. I already got the task accomplished without the "batching" I was hoping for, still I'd like to learn from all of you from this simple example.

    Thanks!

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    So the number after the repeated String is the occurrence number in that group, ordered by the primary key?
    To be honest, I wouldn't know how to do it in SQL ;-)

    However, with Kettle you can read the whole table, ordered by col1, pk.
    Then use the "group by" step, include all row, include the sequence number (groupSeq) in the group (col1 is the group).
    Concatenate using a User Defined Java Expression: col2: col1+"-"+groupSeq.

    Then you can update the target table again with a simple Update step. In the latest drops there should be batch updates, try 4.2.0-M1 on Hudson for example.

    HTH,
    Matt

  8. #8
    Join Date
    Aug 2010
    Posts
    6

    Default

    Thanks for the suggestion, Matt

    Batching is very important to me because I'm currently working on task to sync a few tables between the main and remote (as fail-over) MySQL DB. The connection to the remote DB is very slow, and there was once I tried executing 100K inserts for a developer and it took hours to complete. I then asked the developer to rewrite his code and batch the values together, and the inserts were done in seconds.

    Oh, and here's the location to the 4.2 build, correct?

    http://ci.pentaho.com/view/Data%20In...-M1-r14791.zip

    What about the JDBC connection string? Does it need to be modified, or the app is going to generate the SQL statements accordingly?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.