PDA

View Full Version : Which Pentaho's tool is the best..



tytoos
04-02-2008, 02:04 PM
Hi everybody,
I start my adventure with Pentaho for few days so I'm a twerp in Pentaho ;) I must research this solution for my project in University. And I have big request for all of you to maneuver me into good way. I want to create chart, tables and pie in web page. Data will be from databse and xml file. When somebody change the data in databse I want to refreshes dashboard automatically. And my request for you:
1. which tools from Pentaho I MUST use to do that ??
2. in which OS it will be simpler (Windows, Linux) ??
3. must I use any server if I don't wont to use analysys (mondrian) ??

Thanks for any suggestions ;)

crafter
04-02-2008, 05:56 PM
Perhaps this is a fair place to start.

http://prdownloads.sourceforge.net/pentaho/PentahoTechnicalWhitepaperV1-2.pdf?download

tytoos
04-06-2008, 04:42 AM
Yes, right - it's good document for start, but it didn't explain me which tools I have to install. I saw that Pentaho have a few solutions and I don't know which of them is minimum to tests my task. If something in my first post is misunderstand, please let me know and I try to expalin better .
Best,
Tytoos

crafter
04-06-2008, 11:20 AM
Tytoos

I think your questions are quite straightforward. However, I don't think the answer is.

Firstly, let me dispense with your second question. The issue of most suitable operating system is going to get you very subjective answers. I think it would suffice to say that Pentaho runs on many platforms. in the same way, and that would be final input on the issue.

Regarding the other questions (especially the first) , you must first understand that Pentaho is a suite of products, which are designed to work together, but may work quite independently of each other. So it is therefore not necessary to install the entire suite of products. neither is any one of them necessary for the other (well, sort of).

In addition, some of the components are embeddable, while one someone might view them as a complete solution, someone else might view them as a tool.

Not only that, but it is quite possible to use externally supplied tools and applications and merge their output into Pentaho.

So, I guess what I;m saying is that your answer is not the same as mine ;).

What I can suggest is that you go to the Community|Community Home| [product] page from the top menu bar that will point you to what's available. There are links there to the download page which will also help you in your response.

Good luck in your search and do report back on your findings. I will be interesting to see the results of your study.

tytoos
04-06-2008, 02:07 PM
Firstly, let me dispense with your second question. The issue of most suitable operating system is going to get you very subjective answers. I think it would suffice to say that Pentaho runs on many platforms. in the same way, and that would be final input on the issue.

I know that Pentaho is multiplatform tool, but I'm intrested if in all OS are the same administration?? Or maybe in windows is simpler that in linux??



Regarding the other questions (especially the first) , you must first understand that Pentaho is a suite of products, which are designed to work together, but may work quite independently of each other. So it is therefore not necessary to install the entire suite of products. neither is any one of them necessary for the other (well, sort of).

Yes, I know that Pentaho is a suite of products, therefor I write this post!! I'm intrested which of them I must install in my project. I don't wont install all of them - only essential.



Not only that, but it is quite possible to use externally supplied tools and applications and merge their output into Pentaho.

Hmmm, but why?? Pentaho have so many tools in which I don't know which I have to choose - so why search other externall tool??



So, I guess what I;m saying is that your answer is not the same as mine ;).

Maybe yes, but colud you propose which tools you choose if you must prepare the same project as mine?? :)

Thank you so much for your help :)
Best,
Tytoos

crafter
04-06-2008, 03:46 PM
...but could you propose which tools you choose if you must prepare the same project as mine??

I can only speak from my own experience here.

Well, firstly the BI platform itself is the glue that ties everything together The platform itself is installed in an application server. The PDI that can be downloaded comes built into the JBoss Portal, but the platform is also commonly installed in other portals and application servers like Liferay and Tomcat.

In order to create xactions that the BI platform will execute and display on your dashboard, you will need the Design Studio.

You could design reports to publish on the BI platform, then you would need the Report Designer and/or Report Wizard. An ad-hoc reporting tool is also available to be installed for web-based report creation.

If your data is in XML and database, you will need the PDI (sometimes referred as by its old name, Kettle), which is an ETL tool.

The mondrain tool allow you to provide a web based interface for analysis tasks. You could use the Cube Designer and/or analysis workbench to create your analysis cubes.

Weka is the data mining tool, although I haven't used this.

Then the Meta Data Editor allows you to create your meta data layer to provide data in a way that makes sense to business users,

In terms of automated updating of dashboard content, you could use the Ajax library or use plain old Ajax by sending periodic requests to the server,

Maybe others have a different approach.

Taqua
04-06-2008, 06:04 PM
OK,that was the complete overview :)

For your use-case, you need at least the Pentaho-Platform and the Reporting-Engine. Depending on how good your data already is, you may be able to survive a lightweight approach.

Now the break down into scenarios:

(1) Panic Mode: "I need results! Now! Please!" aka "I get fired if I have no results within the next minutes"

You need at least the Pentaho Platform. To design reports, you need the Report-Designer as well. THe reports will solve your tabular data requirement and may solve your charting requirement. For the charts, you can alternatively use the Pentaho-Chart-Components or (but that requires some manual work on your side) use third-party charting engines.

The dashboard is based on a JSP file; inside the Pentaho Preconfigured Installation (aka the Demo), you'll find a dashboard example that already contains everything you need. Take it as a starting point for your journey.

In Panic mode, your reports and charts would use direct JDBC queries to get the data to display. Depending on your underlying data-model, this may work reasonably well, but in most cases, you will run into severe performance issues once your datamodel is either highly complex or your database is very large.

Then it is time for mode 2.

(2) Data-Warehousing - a data shopping mall for your enterprise

In very simple words, a data-warehouse (DWH from now on) is a database that contains a preprocessed copy of your original data systems. The DWH abandons the idea of highly normalized tables in favor of lightning fast query-performance. Like relational modelling, building a DWH can be anything from stupid simple to highly complex. It is definitely a good thing to read a couple of introductional articles on how to build a DWH (and take some time to learn the ideas behind it) before deploying a final solution.

If you start slow and simple, you almost can't go wrong. But if you try the tools without knowing what you are doing, then you will have a fun time shooting yourself in the foot before you then go back to read the articles or books anyway :)

To setup a DWH, you will need Mondrian (this is the OLAP server that will receive and answer your queries and do all all the fancy optimization so that you do not have to wait ages for the results) and Kettle.

Kettle (or PDI, although only few people call it by that name :)) is the tool that copies the data from your source-systems (ie your sales and accounting databases) into the DWH. Calling Kettle a "copy programm" could not be more accurate and more offending to the massive work and knowledge behind it. A simple copy operation usually does not help you much, as the data in the source systems is usually not in a suitable format for the DWH. Its quality is usually bad (duplicate entries, invalid entries, missing data, it is amazing what you encounter in some systems), and so Kettle contains means to correct the data to make sure that your DWH is in a sane state.

As with the initial modelling of your DWH, the data import has a learning curve. Simple things are simple and easy, but sadly reality tends to be complex. So start with a simple case and continue from there.

Good books on DWH modelling should also cover the basics of the copy operations (called ETL) as well.

(3) Gimme all!

The last missing piece is the automatic refresh. Basically, what you are requesting here is a bit unrealistic (to put it mildly). Just imagine the fun your database adminstrators have if on every accounting record entered in the system, the accounting database gets queried with a complex aggregation query. Expect to enter a record per hour (5 seconds for the input, and 59 minutes and 55 seconds for the refresh-queries to execute).

A more realistic goal is to (for instance) have a update of your DWH every 30 minutes, ever hour or every 6 hours (depending of course on the amount of data you have and the $$$ you want to pay for equipment. I'm quite sure with a system large enough you can get real-time updates, just keep your 20 GB or so database in memory on a dedicated cluster).

If you executed your job in Mode 2 correctly, getting into the near-realtime mode is just a matter of tuning your model, queries and databases and maybe upgrading of your hardware.



BI is no magic, but there is a reason why BI consultants get a lot of money for doing their job. Implementing a BI solution requires at least some understanding of the underlying mechanics and concepts of these systems. But even that is not rocket science and everyone can learn it.

tytoos
04-07-2008, 10:29 AM
Crafter, Taqua
Thank you so much for fully answers :) This is the root of the matter and I need this comment for people which know Pentaho best. I try to put into use yours suggestions and I hope my project will be work :)
Regards,
Tytoos