Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Spoon using only 3% of CPU but running slow!

  1. #1

    Default Spoon using only 3% of CPU but running slow!

    Hi all,

    i have a very large input set of 60k row which is then used as an input to cartesian product with another set of 10k. It takes hours for spoon to perform that product and some other further steps.
    On another hand it is only using 3% of CPU!

    Can you advise if I can somehow make spoon to use more of the more memory/run faster?

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Sometimes Join-Rows is a bad idea.

    Do you have any join conditions set or is it really 600,000,000 result rows you want to produce?
    So long, and thanks for all the fish.

  3. #3

    Default

    I have a few left outer joins after the cartesian product. Any idea?

  4. #4
    Join Date
    Aug 2011
    Posts
    236

    Default

    Hi,

    Is there no way to preprocess some of your input? If you are really processing 600M rows every time, I think you are in a losing battle.

    You need to re-examine your flow/design if you can and possibly make any output re-usable for the next run.

    Without knowing a bit more, it's tough to help.
    PDI 8.0.0
    MySQL - 5.6.27
    Redshift - 1.0.1485
    PostgreSQL 8.0.2
    OS - Ubuntu 10.04.2

  5. #5
    Join Date
    Aug 2016
    Posts
    290

    Default

    Wait, 600'000'000 rows? With how many columns? Even if you had a single column using just a java integer, that is 4 bytes per row. 600'000'000 rows multiplied by 4 = 2.2 GB per column!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.