Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 17

Thread: Table Output Performance - MySQL

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Apr 2008
    Posts
    1,770

    Question Table Output Performance - MySQL

    Hi.
    I know that's already been discussed few times in the past, but I have found the performance of Table Output quite staggering.

    I had to upload records to a table on a remote server so I tried with PDI.
    Text input file, Table Output using MySQL.
    The loading speed was 40/50 records/seconds.

    Since it would have taken hours to complete, I gave up and use Toad instead (http://www.toadworld.com/).
    It loaded all my records in less than 1 hour!

    Since the PC was the same, same file, same internet connection, same destination database (and table), what is PDI doing differently?

    My configuration:
    PDI 4.4
    Win Server 2003
    MySQL driver: mysql-connector-java-5.1.22-bin.jar

    MySQL version on remote database: 5.1.68-cll

    What should I change in my PDI configuration to match the upload speed of TOAD?

    Mick

  2. #2

    Default

    With enabled "Use batch update for inserts", you can usually reach a troughput of some thousend records per seconds an a basic infrastructure. Optimizing can be done with ajusting the commit size (for the batch updates) and enabling parallel writes (right click the step, set a value for "change number of copies to start" greater than 1).

    But, validate first if the output is the real bottleneck. Monitor the "input/output" value in the "step metrics" tab while running. In your case, a value of "10000/0" should be shown (when using default rowset size). If not - you may have a problem elsewhere.

    What is the source of your data? Reading/writing from/to the same table in a transformation will be a killer when using mySQL with myISAM storage engine.
    Last edited by MichaelBieri; 05-08-2013 at 05:19 AM.

  3. #3
    Join Date
    Apr 2008
    Posts
    1,770

    Default

    Hi Michael.
    "Use batch update for inserts" is on by default.

    As I have written (maybe not clearly), I was uploading a text file into a table, very simple and straightforward process.

    I will try enabling the parallel writes, but changing the commit size it did not have any impact at all.

    What I don't understand is the difference in performance between the 2 softwares.
    What does Toad do to have - by default - such a good performance?

    My wild guess is that under the hood they are using a "load data infile" - even on windows ;-)

    Mick

  4. #4
    Join Date
    Dec 2010
    Posts
    193

    Default

    Also,

    I have certain tables coming from Source, how to do MySQL load without specifying target field mappings ??? Any idea. Please help me.
    Sathish
    Back to Pentaho


    'Be the best Pearl in the ocean of wisdom'

  5. #5
    Join Date
    Apr 2008
    Posts
    1,770

    Default

    I have certain tables coming from Source, how to do MySQL load without specifying target field mappings ??? Any idea. Please help me.
    I think that if you have source fields names exactly as Mysql table fields names, then table output (or insert update) should work even if you don't specify the target field mapping.

    Mick

  6. #6

    Default

    There are multiple possibilities. Comparing a "bulk load" to a transactional ("table output") way of loading is not fair. Bulk loaders will normaly not update indexes during loading, loading without transactions, ignoring constraints and so on..

    But, loading 50 records/s is a realy bad number - even when using defaults..

    What is the storage engine of your table? Heavy indexes?

    Btw; there is a mysql bulk loader in PDI as well

  7. #7
    Join Date
    Apr 2008
    Posts
    1,770

    Default

    The storage engine should be InnoDb and there are 5 indexes.

    Comparing a "bulk load" to a transactional ("table output") way of loading is not fair
    I agree, but PDI performance is still quite poor.

    Btw; there is a mysql bulk loader in PDI as well
    I know, but it does not work in Windows

    I am not a programmer, so I don't know what Toad does to have such a good performance - keep in mind that input and output were exactly the same.
    Same table, same indexes, same storage engine, same data to upload.

    IF they use a bulk loading method, why PDI cannot do it somehow?

    Mick.

  8. #8
    Join Date
    Dec 2010
    Posts
    193

    Exclamation

    Thanks Mick. Yes Table Output will work . But MySQL Bulk Loader is loading null or empty data into target when fields are not specified Any work around? Also , I am trying the following command inside execute SQL Script to do a bulk load where fields will get replaced from a variable while doing iteration,

    mysqlimport -h 10.237.69.125 -P 3306 -u u1celm01 -p password s1celm01 /tmp/Category.fifo -c categoryid,categoryname,subcategoryid,subcategoryname --fields-terminated-by=\; --fields-optionally-enclosed-by=\"

    It really connects to MySQL database using MySQL command and lists down all options and usage commands, but after that no processing takes place and it hangs.
    FYi - MYSQL Database Server and Pentaho Server resides in separate machines.
    Sathish
    Back to Pentaho


    'Be the best Pearl in the ocean of wisdom'

  9. #9

    Default

    You don't realy need to use the bulk load utility. You can load as well with the "load data infile" statement by using the execute SQL step. I think this is the way that toad runs. This is an simple and fast way to get a well formed file to a table, but you may get some troubles with error handling or validation if required.

    But im really confused about your performance experience - i have'nt seen such a bad number since years.
    Are you able to analyze your mysql instance with "show full processlist" while loading?
    Ensure that all your datatypes are correct. The JDBC driver goes crazy when its required to cast during loading. (In my case i sent an "Integer" field in PDI to a Varchar-Field in mysql. After adjusting the datatype of a single atribute, i got a performance gain of > 300%. Everything was working well in both cases - but not with the same speed)
    Last edited by MichaelBieri; 05-08-2013 at 11:31 AM.

  10. #10
    Join Date
    Apr 2008
    Posts
    1,770

    Default

    Ensure that all your datatypes are correct.
    Never thought about that.

    I will do a test after checking data types.

    You can load as well with the "load data infile" statement by using the execute SQL step.
    Yes, another good option, but it's easier (ehm.. ) to use table output and get the mapping done on the fly.

    I'll check your suggestion about data types and post something later.

    Mick

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.