Hi all,
I am working with a large census data set (+/- 14.9 millon rows by 103 columns of varibale data type columns) in MySql which I managed to upload and transform from a text file. The large table in MySql has no index.
I am currently trying to create a transformation to summarize/aggregate of the above large dataset/table with some 19 columns of data and one meassure, by doing a sort and then group by and some other transform steps. The problem I am encountering is that the initial Table Input Step is reading the full table on inception not showing me any advance on the "Excecution Results" pane; some 2 or three hours pass and then for some strange reason, the server connection is crashing.
I have also tried getting the DB to do the heavy work, by specifying the "sort" and "group by" in the "Table Input Step"; I get the same server connection crash after some 2 or 3 hours.
I have alternatively tried to create the summarization/aggregation via sql scritp directly in MySql: same problem.
I am working on a W/Ultimate 64bit, MySql 5.5.16 64bit as well; my box is a Core i7 with 8 Gb of RAM; I have modified the spoon.bat memory allocation to 2gb.
I originally was trying the above against a MS SQL Server 2008 R2; similar problems encountered.
Can anyone advise what configuration in Kettle or in MySQL I should be adjusting to allow the extensive process to run correctly.
Kind regards, DMurray3


Reply With Quote
