Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: perf - table output vs text output

  1. #1

    Default perf - table output vs text output

    Hey all,
    I hope this question is a relevant question --

    EDIT: Kettle 2.5.0

    I have a transformation that reads data from one database (big_iron database), does some ModJS steps, a SELECT value step, and then exports it out (1 million records is the scale).

    When I export out to a text-output file, it keeps a relatively steady 2.5k-3k records/second. Pretty good.

    When I export to a Table-Output step, it drops down to 300-1.2k records/second, *regardless the number of steps* I start up. This looks like it impacts the read-rate from the big_iron source, slowing that down. Table-Output is to a different database.

    I started looking at 'Nr of rows in rowset' and increased from 1000 to 10,000 hoping the increase buffer size would help stretch-out any bottlenecks, but doesn't seem to help much.

    Any ideas pls? The overall performance of the transformation has slowed down just by swapping the export step type (even if there are multiple steps of that export type), and would like some options, or at least verify this is indeed normal.
    Last edited by dhartford; 07-19-2007 at 03:23 PM.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Outputting files to a local file system is pretty fast of course. Most of the slow downs I see happening are due to I/O.

    Try bumping the commit size higher. If possible for your data startup multiple copies of your output step.

    Regards,
    Sven

  3. #3

    Default

    Thanks Sven -
    I've tried starting a couple, four, eight, and other higher counts of copies of the Table-Output step at the end of the transformation, including different db-commit sizes ranging from 100, 1000, 5000, 10000.

    The overall problem, regardless the number of copies, is a general limitation in total records/second using the Table-Output step (of 1.2k records/second total sum across the number of copies (i.e. 4 copies at 300 r/s, 8 copies at 150 r/s, and similar scaling regardless the number of copies)). When doing this, read-rate and all the other steps are bottlenecked at around 1.2k - 1.6k r/s, which I know they can go much faster.

    Resolution:

    As a seperate test, I took a Text Input file (from the same transformation actually) that writes to that exact same Table Output to check with a simple transformation the performance. Yup, bottlenecks at 1.2k r/s.

    Copies of a step to a Table-Output does have a limitation related to how quickly the database will accept records before it simply can not go any faster (it was much lower than I would have expected to be honest). So yes Sven, I/O is the bottleneck!

    Case closed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.