Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: clustering and partioning

  1. #1
    Join Date
    Jun 2007
    Posts
    128

    Default clustering and partioning

    Hi,

    I would like to know about clustering and partitioning.
    I have seen some examples of clustering .
    As per my understanding in clustering schema we are specifying the master and slave servers.
    So our process will be distributed among different slaves .is it so?
    Then what is database partitioning and how can we use clustering schema and database partitioning together.

    Thanks

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    So our process will be distributed among different slaves .is it so?
    Yes.

    Then what is database partitioning and how can we use clustering schema and database partitioning together.
    Suppose you want to split the load over different databases.
    What you can do in that case is configure a (database) partitioning schema.
    For example, if you define a partition schema with 20 partitions in it, you can create a "cluster" of databases DB1 through DB20 (see the Database connection dialog, the cluster tab).

    What happens is that each step in the transformation that has the partition set, will be started 20 times.
    If you use the simple "Remainder of division" partitioning method, and you specify an ID as the key, you will get the effect that rows with partitioning key = 0 end up on the first database and with 19 on the 20th, etc.

    If you use partitioning in combination with clustering, we split the load of the different partitions over the slave servers.
    For example if you have 5 slave servers, we will handle 4 partitions per slave server.

    Please note that we removed the notion "database" from "Database partitioning schema" in V3 because it works perfectly fine with files and all kinds of steps as well.

    A few links:

    All the best,

    Matt

  3. #3
    Join Date
    Jun 2007
    Posts
    128

    Default

    Hi,

    In Clustering processing is done on same database using different servers.
    In Partitioning processing is done on different data bases.
    Can we use database join and get the required results by spanning over all partiotions as if the query was performed on a single central database.

    Thanks
    Sreelatha

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Absolutely. If you specify a database lookup step to be partitioned, the transformation will make sure that the rows of data arriving at the step (1 of 20 copies) is partitioned correctly.
    That means that indeed, if you plan it carefully you do the lookup in one of the 20 databases. The one corresponding the specified partition.

    This is of-course the big trick to make it scale linearly. 20 databases can look up a lot faster than a single one.
    BTW, nothing prevents you from copying data in case it partitions badly, you will still see a speedup. For that reason we created the mirror partitioning method.

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.