Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Clustered Environment Setting Question

  1. #1
    Join Date
    Dec 2006
    Posts
    106

    Default Clustered Environment Setting Question

    Daer All..

    Suppose that I have 5 computers, including 1 local computer(10.20.86.1) and 1 master server(10.20.86.2) and 3 slave servers(10.20.86.3~10.20.86.5) within my LAN. All Im going to do is letting my Ktr file run on local computer for clustered processing between master and slave.

    My questions are as follows:

    1. For the master server and 3 slave servers:

    I unzip Kettle-3.0.0-RC2.zip on C:\ and type Carte.bat 10.20.86.2 8080 ~ 10.20.86.5 8080 under C:\Kettle-3.0.0-RC2\ in Command Prompt individually for these 4 computers in order to start as a server.
    Is this setting process correct?

    2 For local computer:

    I just only unzip Kettle-3.0.0-RC2.zip on C:\, using Spoon for executing my Ktr file. Firstly, I specify slave servers in Kettle. In setting up slave servers, I check on "Is the master" checkbox for 10.20.86.2 only and set Username/Password as cluster/cluster for 10.20.86.2~10.20.86.5 individually. Secondly, I specify a cluster schema which are involved slave servers I set up before. Thirdly, I select a schema(set up before) on a step where I want to execute in clustering. And Finally I run such transformation in Clustered Mode.
    Is this setting process correct?

    Thanks for helping me again!!

    Regards,
    Jeff

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Seems about right... but only use it if you really need to for performance reasons. As rule of thumb I would not even consider parallel processing under 10 million rows. Above 50 million rows it's probably a must, anything in between is a grey area depending on requirements.

    - The more moving parts the more chance of something breaking
    - Parallelism/clustering does require more planning/designing

    Regards,
    Sven

  3. #3
    Join Date
    Dec 2006
    Posts
    106

    Default

    aye..

    Thanks so lot, Sven

    Regards,
    Jeff

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    - You always need one master slave server defined in a cluster schema, even if all steps in the transformation are running clustered
    - You can define the master to be on the same slave server as one of the other slaves. (it's just a logical "address")
    - You can have a step run in multiple copies PER slave server if you create a partitioning schema and apply that to the step
    - All data passes over TCP/IP sockets from master to slave (and in case of re-partitioning, from slave to slave too)
    - Data serialization over TCP/IP sockets takes CPU time. Sometimes this is a lot, sometimes not. (lazy conversion is very efficient in this case)
    - To see how the split of the original transformation happens, you can disable all clustering options in the execution dialog and enable "Show transformations"

    HTH,

    Matt

  5. #5
    Join Date
    Dec 2006
    Posts
    106

    Default

    Thx for ur tips.. Matt

    Regards,
    Jeff

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.