Hitachi Vantara Pentaho Community Forums
Results 1 to 1 of 1

Thread: table input clustering?

  1. #1
    Join Date
    Jul 2008

    Question table input clustering?

    having tons of rows to be grinded every hour, I supposed to go with a cluster made up of 3 slaves dedicated only to run kettle jobs/transformations.

    I configured carte on every box and set up a dynamic cluster (specifying only the master server in it and flagging the "dynamic cluster" box.).

    Talking from this side everything seems ok, but the problem came out with "clustered" steps since it seems that

    • you cannot cluster if you don't specify "distribute rows" in the "data movement"
    • you cannot cluster "sort rows" if you don't use a NFS (shared file system)
    • you cannot cluster "INPUT TABLE" steps "...since reading part can't be run in parallel" (I won't mention who told me this...)

    As per the first two points now I'm more clear and I'll be careful from now on, but the third one sounds very strange to me.

    Now, my "dummy's question" is: what is clusterable?

    In my opinion the best way to improve performance is to run queries in parallel by distributing them on several nodes, 'cause normally these steps are the bottlenecks of the transformation.

    Am I using a wrong approach? I'm attaching a very simple transformation to explain my scenario that is obviously more complex.

    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.