PDA

View Full Version : memory issues.



guzaldon
04-20-2006, 09:54 AM
Is there a way to control the flow of memory used by the transaction at creation time.
I'm running RHEL 4 with 8 gig of ram with 8 gig of swap and a 2.8 xeion 32 bit. but for somereason I'm stuck at the of 4 gb per user space, so I can set the -Xmx to about 3500m, still need to tweak the kernal I think, even after install the hugemem 2.6 kernel. but I keep heap dumping. I have read all sorts of stuff on the memory allication per user vs per kernel. it seems as when the transaction initalizes it alicates threads and memory for the different steps. I'm trying to create my fact tables that have a lot of Foriegn keys that it needs to lookup. thus having a lot of steps and soon after initalizing it's at 3.4GB. So I have tried "sorting" the data because this writes it to the temp file but this has not had any affect on the pan not heap dumping.
would the best way to do this would be to create two transaction one to grab all the data and then dump it to a file some place and then other transaction to read from the file and do the looks before the fact table inserts?


thanks



Nic

MattCasters
04-22-2006, 07:45 AM
Hi Nic,

What on earth are you doing that requires a GB of RAM?
It's not a requirement of Kettle to load all lookup data into memory ;-)

Yes, in fact I would suggest to "stage" the data in a database of your choice and to use database lookups.
You can set the cache sizes to tune the performance from there. Sorting can also be done a lot faster by the database.
Sure, it might be slower to stage, but at least you'll be able to run.
Loading all in memory doesn't scale either. What would you do if you got twice the amount of data? Double the RAM?
I bet your IT manager would really love that.... NOT! :-)

Cheers,
Matt

MattCasters
04-22-2006, 07:49 AM
Just to clear this up: a medium sized tranformation would certainly be able to run in under 100M.
(even that seems too much)

Matt

guzaldon
04-24-2006, 09:54 AM
well I found a couple things that were kill me in the memory issue.

I was building my transactions that would pull the old data out of the OLTP and start putting it into the OLAP, but that is a lot of data.

the biggest thing that got me was the I used a left join in the wrong place that was returning like 4 million + rows times the other rows it was returning so I was well well into the billions of rows. but that's been taken care of and I believe is the root cause of the problem.

I knew when I was having memory issues that it I had to be doing something wrong. specially when I was up around 3.5 gigs and still running out of the memory.

but I did end up brake up my transactions quick a bit. though I'm still having problems here and there but I think it's to do with the amount of time that some of the queries are currently taking to run on the dev-OLTP that doesn't have so much horse power.

I'm not the strongest SQL programmer so I make stupid mistakes which agrivate me more than anything.

So I hope that help you understant why this problem was happening to me for other to possible look at and think what they could be doing wrong. The app is sweet so if you have mem issues you are probably turning something into a unmanagable and unweilding result b/c of the wrong SQL statements.

MattCasters
04-24-2006, 11:29 AM
Well, I don't think SQL has anything to do with Kettle memory issues.
There were a few bugs with JDBC drivers on MySQL and Postgres that cause all rows from a resultset to be read into memory, but these are gone now.
The amount of memory used up when dealing with databases is not proportional to the size of the dataset.
However, for "XML Input" that's a different story. The XML document is parsed, checked for validity and stored enterily in memory, and this is what's causing the problem. Ultimately, a SAX parsing algorithm will give us a better solution for "very large" XML documents. This is very complex to write however, at the moment I simply don't have the time, sorry.

All the best,
Matt

JonathonC
04-24-2006, 12:23 PM
Hi Nic,


The 4GB I think you will find is a limit of the 32bit addressable memory space, nothing to do with Kettle, JVM or otherwise. If you need more than this, you need to look at 64bit architectures.



Jonathon

guzaldon
04-24-2006, 12:57 PM
yeah I have learned all I need to know about the mem limit thanks,

guzaldon
04-24-2006, 01:02 PM
well. all I know is that when I watch the transaction take place and I see the mysql is sending data or writing it to the network. I can watch the mb of kettle grow with top and free -mt and it just keep growing and growing and growing and then heap dumps. So something has to be loaded into memory.

I'm just stating what I have seen.
when I start pan to run the transaction it start at lik 48 mb or something and then grows to over 3.5g.

I'm not able to give the transaction b/c I changed it but I do know that once I changed the left join to an inner it was not nearly growing so much.

THanks for your help.

Nic

MattCasters
04-24-2006, 01:04 PM
Actually, I thought that some Intell processors had extentions to go beyond that limit on 32bit machines?
I'm not sure if the JVM would be able to use that limit although I'm pretty sure that Linux is.

In any case, I once loaded 6GB+ in a StreamLookup step on a 8Gb solaris box, so as you say Kettle doesn't set any limits on that either. (Don't ask, please don't ask!)

Cheers,

Matt

MattCasters
04-24-2006, 01:08 PM
Nic,

> There were a few bugs with JDBC drivers on MySQL and Postgres that cause all rows from a resultset to be read into memory, but these are gone now.

I know that for some obscure reason people don't like to tell me their versions of Kettle that they are using (;-)), but in case you didn't do this already, grab a recent development version and let us all know if that works. Like I said, the bug is fixed already.
Kettle should start to read right away in stead of loading all into the JDBC driver.

All the best,

Matt

guzaldon
04-24-2006, 01:42 PM
well I'm using kettle 2.2.2 from pentaho site. I guess you probably deal with a lot of people, but I remember posting some stuff on the pentaho site and I think you mentioned the kettle would best be support here till you guys figured out a good way to move these posts over to pentaho.

I have looked for the development versions a bit and did not find them. though I didn't look too hard. I looked in the source code link above and wasn't too sure. I didn't really feel a need to look any farther b/c the current one I have seems to work well enough.

Though I might be looking the wrong spot.

Thanks agian for the help,

Nic

BTW: I wasn't trying to be smug I was just trying to just relate way I thought it might be a problem with kettle. I was just trying to add some extra info that's all.

guzaldon
04-24-2006, 01:59 PM
well if you want to know about what I learned.
is that solaris has no limit on the memory per user.


it's not so much of a problem with the 32 bit processors anymore with hugemem you can utilized up to 64GB of ram. but the limit comes into play when you have kernel mem vs user/process mem. by default it allicates 50% of physical mem to kernel and and the rest to user mem.



but the mem probably with the 4 gig limi is the per user is the max you can get to with out hacking your kernel unless you are solaris.



Just some more of that mem stuff I learned and don't want to think about it any more.



please drive through



Nic

MattCasters
04-24-2006, 02:04 PM
Hi Nick,


For the time being we will have to do with the Javaforge site. And yes, over time it will all merge.



Javaforge is not that bad actually, we have an auto-build system that gives you a new kettle.jar to put in your lib directory of Kettle, 5 minutes after new code gets commited to the source tree. It's a great feature and something I would like to have on the new system we're setting up.



The kettle.jar ends up on this very site under Documents / Development packages / --> here (http://www.javaforge.com/proj/doc/details.do?doc_id=3719)



I hope you can find this and I bet your problems will vanish and life will be great.



Matt



P.S. Please don't mind my occasional bad joke, I don't get out much ;-)

guzaldon
04-24-2006, 02:11 PM
cool thanks for the info that'll help I'll check it out here in a few hours or the such.


Thanks for the help too.



Nic

guzaldon
04-25-2006, 01:50 PM
after listening to Matt. and I grabbed the kettle2.3.jar from his link I notice a huge drop in memory usage.


THANKS MATT!!!!!!!!!!!!!!!11



Nic

MattCasters
04-25-2006, 02:15 PM
Sure, no problem. It wasn't me that found the MySQL memory leak BTW.
Search the forum for more info on it if you're into these kinds of things.
<dl>
<dt>-)</dd>
</dl>