records per sec slowly decreasing as the transformation progresses
Hi, I know other people have commented on this issue but I don't know if it was ever resolved for them.
I have a large transformation (175 icons) that ran for 2.5 hours and converted 4 million rows of data.
after making a few more changes I cannot get it to run to completion
I notice the records per second starts around 550r/s and slowly decreases to about 300r/s after approx 600,000 rows
is this to be expected?
at this point the Spoon application seems to freeze and I cannot get it to respond.
I am running 4.10 stable
The input database table and the output database table(s) are on the same box with a lot of transforming steps between them
Data movement is set to Copy data to next step for EVERY step
If I am running out of memory I must be very close, because I have ran versions successfully.
I am just wondering if there are any tips or tricks to reduce the amount of memory ?
In a 175-step transformation, there's possibly a not trivial number of bottlenecks. Just to name a few:
- reading/writing filenames (I/O processes slows down the transformation)
- update tables with indexes enforced (indexes have to be managed continously)
- sorting steps (they stopped the usual stream flow, to do the sort)
Chapter 15 of Pentaho Kettle Solutions is an excellent survey of performance tuning tricks in kettle.
Besides, I don't know if we can speak here of best practice, but 175-step sounds to me as a poorly decomposed etl process. I wonder if you can make a better usage of hardware resources (memory and cpu) breaking it in a few transformations/jobs wrapped in a main job: rows can still fly along the result, without touching the disk; debugging/logging will be easier; heap/gc limit are less likely to be hit... and so on.
join the community on ##pentaho - a freenode irc channel