US and Worldwide: +1 (866) 660-7555
+ Reply to Thread
Results 1 to 7 of 7

Thread: Creating closure tables

  1. #1
    Join Date
    May 2007
    Posts
    7

    Post Creating closure tables

    I have created a class to automatically create a closure table in Mondrian. Those who read the documentation know that they drastically improve the performance of parent-child hierarchies.

    Since it's java code, you can cast it either from the command line, or from the javascript engine inside a Kettle tranformation or job. I prefer the former since it's easier to maintain.

    PHP Code:
    public class ClosureBuilder {

        
    //static parameters for invoking the class from the command line

        
    public static String JDBC "com.mysql.jdbc.Driver";
        public static 
    String URL "jdbc:mysql://server/schema?user=username&password=password";
        public static 
    String TABLE "source_table";
        public static 
    String CLOSURE "closure_table";
        public static 
    String KEY "child_id";
        public static 
    String PARENT "parent_id";
        public static 
    Object TOP_LEVEL null//value in parent_id that indicates a top level element.
        
        
        
    private Object topLevel;
        
        
    /**
         * If can use this if you want to invoke the class from the command-line.
         * @param args
         * @throws Exception 
         */
        
    public static void main(String[] argsthrows Exception {

        
        new 
    ClosureBuilder().start(JDBC,URL,TABLE,CLOSURE,KEY,PARENT,TOP_LEVEL);

        }

        public 
    void start(String jdbcString urlString tableString closureString keyString pkeyObject topLevelthrows Exception {
        Class.
    forName(jdbc);
        
    this.topLevel topLevel;
        
        
    Connection conn DriverManager.getConnection(url);
        
    conn.createStatement().execute("DELETE FROM " closure);
        
        
    String sql "select " key ", " pkey " from " table;
        
        
    PreparedStatement pst conn.prepareStatement(sql);
        
        
    ResultSet rs pst.executeQuery();
        
        
    //populate table
        
    HashMap<ObjectObjectmap = new HashMap<ObjectObject>();
        while (
    rs.next()) {
            
    map.put(rs.getObject(1), rs.getObject(2));
        }
        
        
    pst conn.prepareStatement("INSERT INTO " closure " VALUES (?,?,?)");
        
        
    int count 0;
        
    //busca pais e distancia
        
    for (Object current map.keySet()) {
            
    Map<Object,Integerparents = new HashMap<Object,Integer>();
            
            
    //add self as distance 0
            
    parents.put(current0);
            
            
    recurseParents(parents,map,current,1);
            for (
    Object parent parents.keySet()) {
            
    pst.clearParameters();
            
    pst.setObject(1current);
            
    pst.setObject(2parent);
            
    pst.setInt(3,parents.get(parent));
            
    pst.addBatch();
            
    count++;
            }
            
    System.out.println(count);
        }
        
    System.out.println("Executing batch...");
        
    pst.executeBatch();
        
    System.out.println("Done!");
        }

        private 
    void recurseParents(Map<Object,IntegerparentsHashMap<ObjectObjectmapObject keyint distance ) {
        
    //catch infinite loop - change at will
        
    if (distance 20) throw new RuntimeException("infinite loop detected:" key);
        
    Object parent map.get(key);
        
           
        if (
    parent == null || parent == this.topLevel || parent.equals(this.topLevel)) {
            return;
        } else {
            
    parents.put(parent,distance);
            
    recurseParents(parentsmapparentdistance 1);
            return;
        }
        
        }



  2. #2
    Join Date
    Jan 2007
    Posts
    568

    Default

    Thanks cjalmeidabr! I'm sure a lot of people will find this useful.

  3. #3
    Join Date
    Nov 1999
    Posts
    1,558

    Default

    I consulted with Matt Casters and he plans to implement this as a kettle step. Here's the email thread.

    Julian:

    > Could this be a step in Kettle?

    Matt:

    > 3 steps actually: Table Input (reader) - Closure Generator - Table Output (writer)
    > That way it's completely generic.
    >
    > That should be easy to do. The parameters [to the Closure Generator] could be:
    > - source db, key pair fieldnames & schema/tablename
    > I'll build it myself somewhere next week.
    > It shouldn't take more than a few hours with the code in hand.
    >
    > I don't know if Mondrian needs this data in-memory, otherwise you
    > could skip writing to a database table completely.
    > (with in-line ETL)

    Julian:

    > Mondrian needs the data on disk - so it can generate joins.
    > For extra credit: one could right click a parent-child hierarchy in the
    > schema designer and say 'generate closure table' and this would create the
    > necessary 2 steps pre-populated with table + column names.

    Matt:

    > What we can do for the generation of the 3 steps is make a factory of sorts.
    >
    > You can pass the metadata and it will create the transformations, optionally
    > executing it.
    > We already have a few of those, it's pretty easy to do.
    > That turns the extra points exercise into a simple Java API problem.

  4. #4
    Join Date
    May 2007
    Posts
    7

    Default Better code

    People,

    I've reimplemented this code. It's better commented, faster and smarter.

    It also fixes some bugs due to my lack of knowledge about Kimball's slowly changing dimension strategies. Hopefully now it's alright.

    Code is attached...
    Attached Files

  5. #5
    Join Date
    May 2007
    Posts
    7

    Default Another update

    a bug fix.

    code is attached.
    Attached Files

  6. #6
    Join Date
    Nov 1999
    Posts
    7,251

    Default

    This step was created in PDI version 3.0.
    It will appear in the upcoming version 3.0.0-RC1. (appeared in subversion trunk revision 5057)

    transitive-closure-generator.jpg

    All the best,

    Matt
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters@pentaho.org

    Author of the upcoming book Pentaho Kettle Solutions by Wiley. Release date: mid-September 2010.

    Join us on IRC server Freenode.net, channel ##pentaho

  7. #7
    Join Date
    Nov 1999
    Posts
    1,558

    Default

    Thanks Matt! Here's the wiki page Matt created:

    http://wiki.pentaho.org/display/EAI/Closure+Generator

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts