PDA

View Full Version : jira solution - big jira.xml



akostadinov
05-11-2007, 11:42 AM
Hi, anybody using the jira solution? I wanted to try it out but kettle appears to have problems importing big xmls. Mine is 221M and java dies OOM even with dedicated 1,5G

Any workaround, I'm not going to dedicate a 6 or more G machine to loading the jira xml...

akostadinov
05-21-2007, 03:52 PM
cool, I've tried with 20G heap size + 12G spare RAM for OS + rest of the jvm... I've got OOM on a later stage of the data loading process.

2007/05/21 09:57:24 - jira_int_status - Initialising 4 steps...
2007/05/21 09:57:24 - xml - component.0 - Starting to run...
2007/05/21 09:57:24 - xml - component.0 - Opening file: /usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/../jira.xml
2007/05/21 09:57:24 - Stream lookup.0 - Starting to run...
2007/05/21 09:57:24 - Stream lookup.0 - Reading lookup values from step [jira_status.xls]
2007/05/21 09:57:24 - jira_status.xls.0 - Starting to run...
2007/05/21 09:57:24 - INT_STATUS.0 - Starting to run...
2007/05/21 09:59:56 - jira_status.xls.0 - Finished processing (I=0, O=0, R=0, W=7, U=0, E=0
2007/05/21 09:59:56 - Stream lookup.0 - Read 7 values in memory for lookup!
2007/05/21 13:57:40 - xml - component.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=0
Exception in thread "xml - component.0 (Thread-42)" java.lang.OutOfMemoryError: Java heap space

JIRA solution developers, any idea how to fix it or at least is it possible and/or feasible?

ngoodman
05-21-2007, 11:09 PM
cool, I've tried with 20G heap size + 12G spare RAM for OS + rest of the jvm... I've got OOM on a later stage of the data loading process.

JIRA solution developers, any idea how to fix it or at least is it possible and/or feasible?

Thanks for posting. I haven't updated the Jira/Bugzilla solution for Kettle 2.50 and the GA release of the Pentaho platform. I've was on sabatical for a while but am back and need to have a look at updating it soon.

I submitted a Kettle bug report on the memory issues you appear to be having:
http://www.javaforge.com/proj/tracker/itemDetails.do?navigation=true&task_id=4143

And the response has come back that it should be resolved in newer versions of Kettle. Assuming you don't want to wait for me to update the solution (and do another release) you are welcome to try the solution on version 2.5.x of Kettle.

Also, consider running the subjobs separately. For instance, instead of running the ./kitchen.sh /file=bugz_do_everything.kjb run separate versions for the jobs:

./kitchen.sh /file=load_bugz_int_stage1.kjb
./kitchen.sh /file=load_bugz_int_stage2.kjb
./kitchen.sh /file=load_dimensions.kjb
./kitchen.sh /file=load_facts.kjb
./kitchen.sh /file=load_summaries.kjb

Ultimately I'd like to build another release for everyone involved. Thanks for your interest and I'll definitely post something when I do a new build.

Let me know if any of the above helps out.

Nick

akostadinov
05-22-2007, 09:43 AM
Thank you for the suggestion. I'll try and let you know what happened.

akostadinov
05-22-2007, 05:51 PM
Tried running tasks separately:
./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_jira_staging.kjb && echo 1 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_jira_int_stage1.kjb && echo 2 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_jira_int_stage2.kjb && echo 3 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_dimensions.kjb && echo 4 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_facts.kjb && echo 5 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_summaries.kjb

Used jrockit and it completed much faster as well using max of 14G RAM. Will it be any better using recent kettle?

ngoodman
05-25-2007, 06:23 PM
Used jrockit and it completed much faster as well using max of 14G RAM. Will it be any better using recent kettle?

I certainly hope so. I've tested the solution on the bugzilla side of the solution and it's running quicker. I'll update the Jira side as soon as I can and post an update.

What did you think? Will you find the reports helpful?

akostadinov
06-05-2007, 03:38 AM
I've had some data inconsistency issues so I didn't actually get any useful reports. I hope to have some more time for it this week. Will update you as soon as get it working (or not).

akostadinov
06-14-2007, 11:04 AM
Hallo again. Haven't had time to look further into the issue till now. Do you have any idea if the error I see below is due to a data inconsistency (jira data) or could it be caused by anything else?

Thanks much,
Aleksandar

2007/06/14 10:48:11 - Unique rows.0 - Starting to run...
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : java.sql.SQLException: Duplicate entry '12315028-2005-05-12 11:13:
09' for key 1
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2975)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1600)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at com.mysql.jdbc.ServerPreparedStatement.serverExecute(ServerPr
eparedStatement.java:1125)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at com.mysql.jdbc.ServerPreparedStatement.executeInternal(Server
PreparedStatement.java:677)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatem
ent.java:1357)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatem
ent.java:1274)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatem
ent.java:1259)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at be.ibridge.kettle.core.database.Database.insertRow(Database.j
ava:1456)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at be.ibridge.kettle.trans.step.tableoutput.TableOutput.writeToT
able(TableOutput.java:178)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at be.ibridge.kettle.trans.step.tableoutput.TableOutput.processR
ow(TableOutput.java:72)
2007/06/14 10:48:11 - Bugs - ERROR (version 2.3.1, build 63 from 2006/09/14 12:0
4:05 @ sam) : at be.ibridge.kettle.trans.step.tableoutput.TableOutput.run(Tabl
eOutput.java:309)
2007/06/14 10:48:11 - stg_jira_issue_history.0 - ERROR (version 2.3.1, build 63
from 2006/09/14 12:04:05 @ sam) : Because of an error, this step can't continue:

2007/06/14 10:48:11 - stg_jira_issue_history.0 - ERROR (version 2.3.1, build 63
from 2006/09/14 12:04:05 @ sam) : Error inserting row into table [stg_jira_issue
_history] with values: [ISSUE_NAT_ID= 012315028, ISSUE_ACTION_DATE=2005/05/12 11
:13:09.000]
2007/06/14 10:48:11 - stg_jira_issue_history.0 - ERROR (version 2.3.1, build 63
from 2006/09/14 12:04:05 @ sam) :
2007/06/14 10:48:11 - stg_jira_issue_history.0 - ERROR (version 2.3.1, build 63
from 2006/09/14 12:04:05 @ sam) : Error inserting row
2007/06/14 10:48:11 - stg_jira_issue_history.0 - ERROR (version 2.3.1, build 63
from 2006/09/14 12:04:05 @ sam) : Duplicate entry '12315028-2005-05-12 11:13:09'
for key 1
2007/06/14 10:48:11 - stg_jira_issue_history.0 - Finished processing (I=0, O=489
, R=490, W=0, U=0, E=1
2007/06/14 10:48:11 - jira_int_issue_step1 - ERROR (version 2.3.1, build 63 from
2006/09/14 12:04:05 @ sam) : Errors detected!
2007/06/14 10:48:11 - jira_int_issue_step1 - ERROR (version 2.3.1, build 63 from
2006/09/14 12:04:05 @ sam) : Errors detected!
2007/06/14 10:48:11 - CREATE RECORDS.0 - Finished reading query, closing connect
ion.
2007/06/14 10:48:11 - UPDATED RECORDS.0 - Finished reading query, closing connec
tion.

akostadinov
06-26-2007, 05:28 PM
I get the same primary key duplication issue with few jira backup xmls even taken just after data consistency verification. The consistency verification is performed through the JIRA interface button. The few tries I did to load data into mysql I get the error at the same point with the same data value.

Any other way to verify JIRA data consistency? Any idea how can I workaround/fix the problem?

And just a suggestion for the documentation. To get Project IDs from the jira.xml you can use the ugly command below:
cat jira.xml | sed -e 's/.*\(<Project \).*\(id="\)\([0-9]\+\)".*/\3/p' -e 'd' | sort