Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Setting up a demo of BI Server

  1. #1
    Join Date
    Apr 2012
    Posts
    4

    Default Setting up a demo of BI Server

    Hi all - I'm currently trying to evaluate the Pentaho suite for my company. To begin with, I'm evaluating the BI Server. I've started off by downloading biserver-ce-3.10.0-stable.tar.gz and was able to spin up the BI Server and Admin Console pretty easily on a linux box. The included sample data all works fine, too. Now, I'm trying to play around with creating some ad-hoc reports and analysis off of a fact table and surrounding dimensions that I have in a mysql database. My fact table has a little over 9 million rows and the dimension tables may have up to ~300K rows. I was able to create the db connection in Admin Console just fine. But, i can't seem to create the data source that references these tables in the BI Server. When I try to create the data source, I'm specifying to do so using database tables. I then select my fact and dimension tables, but cannot get past the part where i define my table joins. If I specify the data source is to be used in both Reporting and Analysis, I see that I have to specify which table is specifically my fact table. After I've done this and get to the screen where I specify the joins, my fact table is shown in the left join and one of my dimension tables is show in the right join. For my dimension table, i see all of the columns that are in the table. However, in the left join area, where my fact table resides, no columns appear. Looking at the tomcat log, I eventually see errors relating to "OutofMemoryError: GC overhead limit exceeded".

    So, one thing i tried to do was to increase the amount of RAM for the JVM, in the start-pentaho.sh script. I updated as such:
    export CATALINA_OPTS="-Xms256m -Xmx4096m -XX:MaxPermSize=256m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000"

    So, my understanding is that I've now specified the JVM to start with 256MB of RAM, but am allowing it to grow to a max of 4096MB. After making this change, I went back and tried to create my data source again. I still see the same behavior on the front end (no columns in the Left Join area for my fact table), but it takes longer (about 10 minutes) before I see the same memory error. Once I get to the screen to define table joins, if i run a "top" on my server, I see that the related java process is using 100% of one cpu and is consuming 4.5GB of RAM. This persists until the memory error is thrown.

    The box i'm running this on is quite beefy, and also hosts the actual mysql database that i'm trying to create the datasource from. The server CPU is an 8-way 885 AMD processor with 64GB of RAM.

    I'm guessing that since I'm using the pre-configured BI Server, the configuration may not be optimal to work with the data set I'm experimenting with, but I'm not sure where/what I should adjust, outside of giving the JVM more memory. I also tried to reconfigure the BI Server to use mysql for the Hibernate and Quartz layer, thinking that perhaps would help. Unfortunately, while I thought I got that setup correctly, I couldn't get past the login on to the BI Server after making that change. So, I've gone back to using the built-in Hibernate/Quartz configuration, using the default Hypersonic configuration.

    Obviously, I'm new to the Pentaho suite of products, so any advice or help would be much appreciated!

    thanks!
    arawan

  2. #2
    Join Date
    Apr 2012
    Posts
    4

    Default

    UPDATE: I see that Pentaho is executing a "select *" against the fact table. That session disappears out of the database after a few minutes, but the java process continues to churn until throwing the OutOfMemoryError message. So, it seems that when creating a new data source, Pentaho selects all of the data from the tables being added to the data source? It seems like the entire table is being selected and the results are being stored in the JVM memory? If that's the case, i can see understand why i would see the OutOfMemory error message. But, I don't understand why this would be necessary to define table joins? Also, my fact table contains 2.18GB of data. I've allocated 4096MB to be the max JVM size, so not sure why that wouldn't be enough in any case?
    Last edited by arawan; 04-04-2012 at 07:14 PM.

  3. #3
    Join Date
    Apr 2012
    Posts
    4

    Default

    Well, I now have discovered that I can define the data source using Pentaho Metadata Editor, publish it to the BI Server, then create an Ad Hoc report off of the newly defined data source. It seems like defining the Data Source within the BI Server interface is not optimal for large data sets.

  4. #4
    Join Date
    Nov 2005
    Posts
    164

    Default

    arawan,

    Yes, you are correct. The Data Access component is for smaller data sets or for quick prototyping. For the full set of Pentaho features, you will still need to use the client tools like PME and/or Schema Workbench. However, we're always making progress.

  5. #5
    Join Date
    Apr 2012
    Posts
    4

    Default

    thank you for the confirmation, bhagan. i have been learning my way around PME over the last week. I haven't looked at Schema Workbench yet, so will take a look at that, too.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.