Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Performance vs. DTS

  1. #1

    Default Performance vs. DTS

    Hi,

    we are trying to migrate some processes away from DTS, so I have made a test: Let DTS and Spoon push 770,000 records (3200 bytes wide apx.) to another table within one instance but different db of SQLServer. Results are as follows (in minutes, logging minimal):

    DTS Spoon
    21 44
    24 45
    20 47

    I did not expect DTS to be so much faster, so what did I wrong, what do I have to set? I am a complete beginner to Pentaho. Pls find below the transformation.

    Thank you

    <?xml version="1.0" encoding="UTF-8"?>
    <transformation>
    <info>
    <name>thruput</name>
    <directory>&#47;test_ov</directory>
    <log>
    <read/>
    <write/>
    <input/>
    <output/>
    <update/>
    <connection/>
    <table/>
    <use_batchid>Y</use_batchid>
    <use_logfield>N</use_logfield>
    </log>
    <maxdate>
    <connection/>
    <table/>
    <field/>
    <offset>0.0</offset>
    <maxdiff>0.0</maxdiff>
    </maxdate>
    <size_rowset>1000</size_rowset>
    <sleep_time_empty>1</sleep_time_empty>
    <sleep_time_full>1</sleep_time_full>
    <unique_connections>N</unique_connections>
    <feedback_shown>Y</feedback_shown>
    <feedback_size>5000</feedback_size>
    <using_thread_priorities>Y</using_thread_priorities>
    <shared_objects_file/>
    <dependencies>
    </dependencies>
    <partitionschemas>
    </partitionschemas>
    <slaveservers>
    </slaveservers>
    <clusterschemas>
    </clusterschemas>
    <modified_user>admin</modified_user>
    <modified_date>2007&#47;03&#47;22 17:22:39.573</modified_date>
    </info>
    <notepads>
    </notepads>
    <connection>
    <name>BKGR_datamart</name>
    <server>gersql01</server>
    <type>MSSQL</type>
    <access>Native</access>
    <database>bkgr_datamart;instance=PROD</database>
    <port>1433</port>
    <username>bkgr_dwh</username>
    <password>Encrypted 2be98afc86aa7f2e4a912a962e196b8d2</password>
    <servername/>
    <data_tablespace/>
    <index_tablespace/>
    <attributes>
    <attribute><code>EXTRA_OPTION_MYSQL.defaultFetchSize</code><attribute>500</attribute></attribute>
    <attribute><code>EXTRA_OPTION_MYSQL.useCursorFetch</code><attribute>true</attribute></attribute>
    <attribute><code>IS_CLUSTERED</code><attribute>N</attribute></attribute>
    <attribute><code>MAXIMUM_POOL_SIZE</code><attribute>10</attribute></attribute>
    <attribute><code>PORT_NUMBER</code><attribute>1433</attribute></attribute>
    <attribute><code>STREAM_RESULTS</code><attribute>Y</attribute></attribute>
    <attribute><code>USE_POOLING</code><attribute>N</attribute></attribute>
    </attributes>
    </connection>
    <connection>
    <name>BKGR_Staging</name>
    <server>gersql01</server>
    <type>MSSQL</type>
    <access>Native</access>
    <database>bkgr_staging;instance=PROD</database>
    <port>1433</port>
    <username>bkgr_dwh</username>
    <password>Encrypted 2be98afc86aa7f2e4a912a962e196b8d2</password>
    <servername/>
    <data_tablespace/>
    <index_tablespace/>
    <attributes>
    <attribute><code>EXTRA_OPTION_MYSQL.defaultFetchSize</code><attribute>500</attribute></attribute>
    <attribute><code>EXTRA_OPTION_MYSQL.useCursorFetch</code><attribute>true</attribute></attribute>
    <attribute><code>IS_CLUSTERED</code><attribute>N</attribute></attribute>
    <attribute><code>MAXIMUM_POOL_SIZE</code><attribute>10</attribute></attribute>
    <attribute><code>PORT_NUMBER</code><attribute>1433</attribute></attribute>
    <attribute><code>STREAM_RESULTS</code><attribute>Y</attribute></attribute>
    <attribute><code>USE_POOLING</code><attribute>N</attribute></attribute>
    </attributes>
    </connection>
    <order>
    <hop> <from>Table input</from><to>Table output</to><enabled>Y</enabled> </hop> </order>

    <step>
    <name>Table input</name>
    <type>TableInput</type>
    <description/>
    <distribute>Y</distribute>
    <copies>1</copies>
    <partitioning>
    <method>none</method>
    <field_name/>
    <schema_name/>
    </partitioning>
    <connection>BKGR_Staging</connection>
    <sql>SELECT * FROM c_daily</sql>
    <limit>0</limit>
    <lookup/>
    <execute_each_row>N</execute_each_row>
    <variables_active>N</variables_active>
    <cluster_schema/>
    <GUI>
    <xloc>296</xloc>
    <yloc>450</yloc>
    <draw>Y</draw>
    </GUI>
    </step>

    <step>
    <name>Table output</name>
    <type>TableOutput</type>
    <description/>
    <distribute>Y</distribute>
    <copies>1</copies>
    <partitioning>
    <method>none</method>
    <field_name/>
    <schema_name/>
    </partitioning>
    <connection>BKGR_Staging</connection>
    <schema>dbo</schema>
    <table>ov_thruput</table>
    <commit>100</commit>
    <truncate>Y</truncate>
    <ignore_errors>N</ignore_errors>
    <use_batch>Y</use_batch>
    <partitioning_enabled>N</partitioning_enabled>
    <partitioning_field/>
    <partitioning_daily>N</partitioning_daily>
    <partitioning_monthly>Y</partitioning_monthly>
    <tablename_in_field>N</tablename_in_field>
    <tablename_field/>
    <tablename_in_table>Y</tablename_in_table>
    <return_keys>N</return_keys>
    <return_field/>
    <cluster_schema/>
    <GUI>
    <xloc>466</xloc>
    <yloc>455</yloc>
    <draw>Y</draw>
    </GUI>
    </step>

    </transformation>

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    290 rows/s indeed seems very slow, even if you double that number for DTS.
    Is there a CPU bottleneck, a network problem, is DTS using an internal SQL Server trick somewhere, network compression perhaps, hard to say really.

  3. #3

    Default

    The table is about 300 columns wide, so I am used to slow processing especially from SQLServer2k and a bit from Postgres. What worries me is the ratio DTS/Spoon. 8 CPUs only exceed 20% from time to time, both are running on the same physical and virtual machine, so no network problem (Yes, I am no network nerd. Pls correct me...)

    What more information could be helpful? I am willing to bother even our network admin.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Again I have no idea as I don't know anything about DTS.
    However, it seems likely that for the extremely simple case ( 1 table to 1 table, all SQL Server ) Microsoft did some optimisations.

    Now, try these things:
    - Set a sufficiently large commit size (thousands of rows)
    - Disable thread priority management (Transformation settings, Misc tab)
    - Start multiple copies of the table output step (click right on the Table Output step)

    Perhaps it will go faster.

    Matt

  5. #5
    Join Date
    Mar 2007
    Posts
    16

    Default

    Matt, do you know or think that kettle would run with Java 5 or 6 ? Especially Java6 should bring some performance enhancements...
    Cheers,Tom.

  6. #6

    Default

    This is what our local Java-Guru says as well. He will have a look into our machine what can be improved.

    In the meantime I have done some testing with another table (20 columns, 3M rows) trying the settings you suggested to change:

    - Set a sufficiently large commit size (thousands of rows)
    a little improvement together with 'nr of rows in rowset'
    - Disable thread priority management (Transformation settings, Misc tab)
    no change
    - Start multiple copies of the table output step (click right on the Table Output step)
    Wow! here is the outcome
    NoOfCopies r/s
    1 1000/s
    2 1500/s
    3 1884/s
    4 1100/s
    5 misbehaves
    Thanks a lot for that hint. I will go on testing the monster table and will let you know.

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Just for the record, running on Java5 or Java6 is no problem at all and does indeed give a performance increase, depending on the type of transformation of up to 15%

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.