View Full Version : Help, its slower than a granny

05-20-2011, 09:21 AM
Dear all,

I switched from Pentaho CE to Pentaho Hadoop version at the beginning of May. I followed the installation manual provided by you guys. But in my opinion, It has only get slower than on my laptop and that says something.

Hardware(in virtualbox):
2516MB memory
8GB hard-disc

OS Ubuntu 11.0.X
Hadoop (latest version)
Pentaho for Hadoop (latest downloadable version)

I have the following processes open:

Mysql(Pentaho version)
Bi server

The speed at the moment is 230 r/s at input and around 190 r/s

The transformation I have running is simple and straight forward.

1x table input
6x Combination lookup
1x fact table

It is an upload toward an external server.

It has to process 24 million rows and 60 million the next week...So what can you guys recommend I should do. Should/Could I combine more nodes from other servers to process it more quickly? If so, any manual you can direct me to?

05-20-2011, 09:45 AM
Hello Jiann,

Are you using the Transformation Job Executor to process data using the PDI Engine inside of Hadoop? If so, make sure you are using the latest EE release (version 4.1.2) as there are some significant performance improvements for that use case.


05-20-2011, 10:43 AM
I am using the newest ee-engine as far as I can see

PDI 4.1.2

On my laptop it is using 4.1.0 and it goes 80 times faster. between 5000 and 8000 r/s

05-20-2011, 10:46 AM
Doesn't sound like an issue I'm familiar with. That performance difference is very strange, I would recommend opening a case with support so that you can securely exchange your transformations with them and we can dig deeper, maybe provide some suggestions or if there is a defect causing that we can get it captured and fixed in the next patch releaese.

05-23-2011, 05:26 AM
I opened a case just now. Now I hope to get a useful response. Furthermore, I have to ask you the following question: At my own localhost, it is not a problem to get it a speed between 5k and 8k r/s, but when I attach it to my staging server, it dropped quite rapidly...to 200 r/s (within a minute). Is that normal?

05-23-2011, 06:46 AM
Absolutely not, which is why I suggested you get a case open so that we can help you troubleshoot. Maybe a difference in configuration, maybe somehow from your staging server its having trouble accessing some of the resources used in the transformations/jobs. Have you turned on performance monitoring and watched the step performance graph to see if a particular step is the bottleneck? Are the JVM memory allocations on the staging server different? Are the logging levels different between the runs?

Try and get as much environment and reproduction information over to support and I'm sure we can get to the bottom of it.


05-25-2011, 03:33 AM
Hi Jiann,

You say:

"The transformation I have running is simple and straight forward.

1x table input
6x Combination lookup
1x fact table"

Are you processing your data with the Pentaho JVM on your labtop or are you processing it with Hadoop as engine?

I ask it because the metrics you mentioned (200 r/s) sounds like just a normal transformation running on your local JVM. Hadoop does not provide such metrics.

05-26-2011, 06:12 AM
Dear Jasper,

On my laptop I am running Pentaho windows version and I get a average 7500 r/s and running localhost. On Hadoop I am CONNECTED to the remote server, while running Pentaho with Hadoop locally and I get 200 r/s. These are two separate machines.