Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: Problem with a big SQL table to export

  1. #1
    Join Date
    Mar 2014
    Posts
    19

    Question Problem with a big SQL table to export

    Hi,
    I need to copy a big sql table (more or less 800MB...) into a file (xlsx whould be greate, but i tried even the special option in the proper file output and if fails so even csv is ok). I tried a lot of ways, but the out of memeory problem stopped me most of the times...(also giving more memory to the JVM)
    Last but not least , I tried to let sql server take care of this using a SELECT .... INTO OUTFILE , but i can't make kettle executing it.. so I put the entire statement into a txt file and I tried to make kettle use it with the command SOURCE /home/..../example.txt , but i recieve only errors....
    Could someone help me, please?

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You really should show us the error messages you get.
    Also your transformation attached to the thread would be helpful.

    800 MB isn't much for Kettle to handle, but I wouldn't write that amount of data to a spreadsheet ever. What for anyway? Export to a CSV file and let somebody else watch Excel choke on it.

    IMO you should concentrate on your last idea, if possible: Let the server export to a CSV file by its own.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Mar 2014
    Posts
    19

    Default

    Hi marabu, thanks for the reply
    Here is the error:
    2014/12/24 10:05:43 - Spoon - Trasformazione aperta.
    2014/12/24 10:05:43 - Spoon - Esecuzione trasformazione [Trasformazione 1]...
    2014/12/24 10:05:43 - Spoon - Esecuzione trasformazione avviata.
    2014/12/24 10:05:43 - Trasformazione 1 - Spedizione iniziata per la trasformazione [Trasformazione 1]
    2014/12/24 10:05:43 - Transformation metadata - Ordinamento naturale dei passi eseguito in 0 ms (7 di tempo precedentemente calcolato)
    2014/12/24 10:05:43 - data e data_inizio.0 - Optimization level set to 9.
    2014/12/24 10:05:43 - Generate Rows 2.0 - Elaborazione terminata (I=0, O=0, R=0, W=1, U=0, E=0)
    2014/12/24 10:05:44 - Set Variables.0 - Impostazione delle variabili d'ambiente...
    2014/12/24 10:05:44 - Set Variables.0 - Impostazione variabile DATA al valore [2014/10/31 23:30:17.000]
    2014/12/24 10:05:44 - Set Variables.0 - Impostazione variabile DATA_INIZIO al valore [2014/10/01 23:30:17.000]
    2014/12/24 10:05:44 - data e data_inizio.0 - Elaborazione terminata (I=0, O=0, R=1, W=1, U=0, E=0)
    2014/12/24 10:05:44 - Write to log.0 -
    2014/12/24 10:05:44 - Write to log.0 - ------------> Linea n° 1------------------------------
    2014/12/24 10:05:44 - Write to log.0 - pu = 1
    2014/12/24 10:05:44 - Write to log.0 - data = 2014/10/31 23:30:17.000
    2014/12/24 10:05:44 - Write to log.0 - data_inizio = 2014/10/01 23:30:17.000
    2014/12/24 10:05:44 - Write to log.0 -
    2014/12/24 10:05:44 - Write to log.0 - ====================
    2014/12/24 10:05:44 - Set Variables.0 - Finished after 1 rows.
    2014/12/24 10:05:44 - Set Variables.0 - Elaborazione terminata (I=0, O=0, R=1, W=1, U=0, E=0)
    2014/12/24 10:05:44 - Write to log.0 - Elaborazione terminata (I=0, O=0, R=1, W=1, U=0, E=0)
    2014/12/24 10:05:44 - Select values.0 - Elaborazione terminata (I=0, O=0, R=1, W=1, U=0, E=0)
    2014/12/24 10:05:59 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : UnexpectedError:
    2014/12/24 10:05:59 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : java.lang.OutOfMemoryError: Java heap space
    2014/12/24 10:05:59 - Da_log_Job_app.0 - Finished reading query, closing connection.
    2014/12/24 10:05:59 - Microsoft Excel Writer.0 - Elaborazione terminata (I=0, O=1427, R=1426, W=1425, U=0, E=0)
    2014/12/24 10:06:04 - Da_log_Job_app.0 - Elaborazione terminata (I=4637, O=0, R=1, W=4636, U=0, E=1)
    2014/12/24 10:06:04 - Spoon - La trasformazione è terminata!
    2014/12/24 10:06:04 - Trasformazione 1 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errori rilevati!
    2014/12/24 10:06:04 - Trasformazione 1 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errori rilevati!
    2014/12/24 10:06:04 - Trasformazione 1 - Trasformazione 1
    2014/12/24 10:06:04 - Trasformazione 1 - Trasformazione 1
    I tried to attach the transformation, but unfortunately I can't upload any file...it give me this error: 500 [IOErrorEvent type="ioError" bubbles=false cancellable=false eventPhase=2 text="Error #2038"]
    Anyway, my transformation it's not that complicated, it's composed by a Generate Row (to make the next Modified Java Script Value work) , a Modified Java Script Value (to generate a date interval: begin-date and end-date) , a Table Output (that read from database the kettle log_Job table) and a Microsoft Excel Writer whit Stream XSLX data flagged.
    Sometimes it work, but most of the times no. Sometimes Table Output go wrong and sometimes the Microsoft Excel Writer go wrong...
    I tried a lot of things after the tables output: CSV , text out etc... but nothing work properly...
    The biggest problem is the table output the make the kettle going in out of memory...
    About my idea of use the SOURCE whit mysql, unfortunately I can't use it beacause of the JDBC driver doesn't let you use it...

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Your description and the log lines don't match.
    Anyway, you seem to have memory issues (java heap error).
    You can either provide more memory or adjust buffer sizes.
    I keep hearing POI is a memory hog, so I really would stay away from XLSX for data transportation.
    SOURCE is a mysql shell command, so not very useful in a SQL script.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Mar 2014
    Posts
    19

    Default

    I thought I have to use SOURCE, beacause of I thought I can't use the statement SELECT INTO OUTFILE in kettle, but now I see I was wrong... so I'm trying to use this solution.
    I use this query :
    SELECT '"ID_JOB"','"CHANNEL_ID"','"JOBNAME"','"STATUS"','"LINES_READ"','"LINES_WRITTEN"','"LINES_UPDATED"','"LINES_INPUT"','"LINES_OUTPUT"','"LINES_REJECTED"','"ERRORS"','"STARTDATE"','"ENDDATE"','"LOGDATE"','"DEPDATE"','"REPLAYDATE"','"LOG_FIELD"' UNION ALL SELECT ID_JOB,CHANNEL_ID,JOBNAME,STATUS,LINES_READ,LINES_WRITTEN,LINES_UPDATED,LINES_INPUT,LINES_OUTPUT,LINES_REJECTED,ERRORS,STARTDATE,ENDDATE,LOGDATE,DEPDATE,REPLAYDATE,LOG_FIELD FROM log_Job_app WHERE ENDDATE BETWEEN ? AND ? INTO OUTFILE '/tmp/prova_export_2.csv' FIELDS TERMINATED BY ';' ENCLOSED BY ''LINES TERMINATED BY '\n' ;
    Table output create the file prova_export_2.csv , but It give a error like this:
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errore inatteso
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : org.pentaho.di.core.exception.KettleDatabaseException:
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Couldn't get row from result set
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : ResultSet is from UPDATE. No Data.
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.core.database.Database.getRow(Database.java:2758)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.core.database.Database.getRow(Database.java:2730)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.core.database.Database.getRow(Database.java:2704)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.steps.tableinput.TableInput.doQuery(TableInput.java:257)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.steps.tableinput.TableInput.processRow(TableInput.java:143)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at java.lang.Thread.run(Thread.java:744)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Caused by: java.sql.SQLException: ResultSet is from UPDATE. No Data.
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1084)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:973)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:6995)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.core.database.Database.getRow(Database.java:2746)
    2014/12/24 13:23:45 - Da_log_Job_app.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : ... 6 more
    I'll keep trying this way, what's your opinion?

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Since you seem to be able to generate the export file via SQL, I would expect you to use a SQL job entry.
    A Table Input is only used when you want to access a query result.
    So long, and thanks for all the fish.

  7. #7
    Join Date
    Mar 2014
    Posts
    19

    Default

    I tried this way before, but i get errors. It seems like it can't make variable sostitutions, so I split my transformation into two differnts transformations and I get this error :
    2014/12/29 09:14:03 - Execute SQL script.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Si è verificato un errore, l'elaborazione verrà interrotta:
    2014/12/29 09:14:03 - Execute SQL script.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Couldn't get row from result set
    2014/12/29 09:14:03 - Execute SQL script.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : ResultSet is from UPDATE. No Data.
    2014/12/29 09:14:03 - Execute SQL script.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errore durante l'inizializzazione del passo [Execute SQL script]
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Passo [Execute SQL script.0] fallito nell'inizializzazione!
    2014/12/29 09:14:03 - Execute SQL script.0 - Terminata la lettura della query, chiusura connessione.
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Impossibile preparare l'esecuzione della trasformazione
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : org.pentaho.di.core.exception.KettleException:
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Almeno un passo non è stato inizializzato. L'esecuzione non può iniziare!
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.Trans.prepareExecution(Trans.java:932)
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.Trans.execute(Trans.java:504)
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:1034)
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:589)
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:728)
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:728)
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:443)
    2014/12/29 09:14:03 - Trasformazione 3 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.run(Job.java:363)
    This is the query i used:
    SELECT "ID_JOB","CHANNEL_ID","JOBNAME","STATUS","LINES_READ","LINES_WRITTEN","LINES_UPDATED","LINES_INPUT","LINES_OUTPUT","LINES_REJECTED","ERRORS","STARTDATE","ENDDATE","LOGDATE","DEPDATE","REPLAYDATE","LOG_FIELD" UNION ALL SELECT ID_JOB,CHANNEL_ID,JOBNAME,STATUS,LINES_READ,LINES_WRITTEN,LINES_UPDATED,LINES_INPUT,LINES_OUTPUT,LINES_REJECTED,ERRORS,STARTDATE,ENDDATE,LOGDATE,DEPDATE,REPLAYDATE,LOG_FIELD FROM log_Job_app WHERE ENDDATE BETWEEN '?' AND '?' INTO OUTFILE '/tmp/prova_export_2.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' ;
    I used table output because of after this step I whould like to use Microsoft excel writer step...

  8. #8
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I'm still thinking you are better off with a SQL job entry. You can't use a prepared statement, but that's not necessary in your scenario - just use Kettle variables.

    In your error log the relevant part to me is "Couldn't get row from result set -- ResultSet is from UPDATE. No Data.".

    BTW: Where did you learn to put single quotes around the place holders for statement preparation?
    So long, and thanks for all the fish.

  9. #9

    Default

    I had a similar problem when implemented Pentaho in my company and what I needed to do was to increase (not so much) the memory allocation within the startup script and ran the transformation through the command line using Kettle.bat/Pan.bat executables.

    After I did that things started working really smooth.

  10. #10
    Join Date
    Mar 2014
    Posts
    19

    Thumbs up

    I solved it usign a Call DB Procedure step.

    BTW: Where did you learn to put single quotes around the place holders for statement preparation?
    I learned it using kettle and here: http://wiki.pentaho.com/display/EAI/Execute+SQL+script
    Strings need to be enclosed with quotes or double quotes depending on the database dialect
    Thanks

  11. #11

    Default

    I have found that adding the java option " -XX:MaxPermSize=128m" fixes most heap space errors. Of course also try adjusting the max memory option of -Xmx1024m or -Xmx1536m or -Xmx2048m

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.