Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: jira solution - big jira.xml

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    May 2007
    Posts
    8

    Default jira solution - big jira.xml

    Hi, anybody using the jira solution? I wanted to try it out but kettle appears to have problems importing big xmls. Mine is 221M and java dies OOM even with dedicated 1,5G

    Any workaround, I'm not going to dedicate a 6 or more G machine to loading the jira xml...

  2. #2
    Join Date
    May 2007
    Posts
    8

    Cool

    cool, I've tried with 20G heap size + 12G spare RAM for OS + rest of the jvm... I've got OOM on a later stage of the data loading process.

    2007/05/21 09:57:24 - jira_int_status - Initialising 4 steps...
    2007/05/21 09:57:24 - xml - component.0 - Starting to run...
    2007/05/21 09:57:24 - xml - component.0 - Opening file: /usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/../jira.xml
    2007/05/21 09:57:24 - Stream lookup.0 - Starting to run...
    2007/05/21 09:57:24 - Stream lookup.0 - Reading lookup values from step [jira_status.xls]
    2007/05/21 09:57:24 - jira_status.xls.0 - Starting to run...
    2007/05/21 09:57:24 - INT_STATUS.0 - Starting to run...
    2007/05/21 09:59:56 - jira_status.xls.0 - Finished processing (I=0, O=0, R=0, W=7, U=0, E=0
    2007/05/21 09:59:56 - Stream lookup.0 - Read 7 values in memory for lookup!
    2007/05/21 13:57:40 - xml - component.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=0
    Exception in thread "xml - component.0 (Thread-42)" java.lang.OutOfMemoryError: Java heap space

    JIRA solution developers, any idea how to fix it or at least is it possible and/or feasible?

  3. #3
    Join Date
    Jun 2005
    Posts
    144

    Default Needs to be updated

    Quote Originally Posted by akostadinov View Post
    cool, I've tried with 20G heap size + 12G spare RAM for OS + rest of the jvm... I've got OOM on a later stage of the data loading process.

    JIRA solution developers, any idea how to fix it or at least is it possible and/or feasible?
    Thanks for posting. I haven't updated the Jira/Bugzilla solution for Kettle 2.50 and the GA release of the Pentaho platform. I've was on sabatical for a while but am back and need to have a look at updating it soon.

    I submitted a Kettle bug report on the memory issues you appear to be having:
    http://www.javaforge.com/proj/tracke...e&task_id=4143

    And the response has come back that it should be resolved in newer versions of Kettle. Assuming you don't want to wait for me to update the solution (and do another release) you are welcome to try the solution on version 2.5.x of Kettle.

    Also, consider running the subjobs separately. For instance, instead of running the ./kitchen.sh /file=bugz_do_everything.kjb run separate versions for the jobs:

    ./kitchen.sh /file=load_bugz_int_stage1.kjb
    ./kitchen.sh /file=load_bugz_int_stage2.kjb
    ./kitchen.sh /file=load_dimensions.kjb
    ./kitchen.sh /file=load_facts.kjb
    ./kitchen.sh /file=load_summaries.kjb

    Ultimately I'd like to build another release for everyone involved. Thanks for your interest and I'll definitely post something when I do a new build.

    Let me know if any of the above helps out.

    Nick

  4. #4
    Join Date
    May 2007
    Posts
    8

    Default

    Thank you for the suggestion. I'll try and let you know what happened.

  5. #5
    Join Date
    May 2007
    Posts
    8

    Default

    Tried running tasks separately:
    ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_jira_staging.kjb && echo 1 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_jira_int_stage1.kjb && echo 2 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_jira_int_stage2.kjb && echo 3 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_dimensions.kjb && echo 4 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_facts.kjb && echo 5 && ./kitchen.sh -file=/usr/local/pentahoOBS/pentaho-server/pentaho-solutions/software-quality/data/etl/load_summaries.kjb

    Used jrockit and it completed much faster as well using max of 14G RAM. Will it be any better using recent kettle?

  6. #6
    Join Date
    Jun 2005
    Posts
    144

    Default Updating for kettle25

    Quote Originally Posted by akostadinov View Post
    Used jrockit and it completed much faster as well using max of 14G RAM. Will it be any better using recent kettle?
    I certainly hope so. I've tested the solution on the bugzilla side of the solution and it's running quicker. I'll update the Jira side as soon as I can and post an update.

    What did you think? Will you find the reports helpful?

  7. #7
    Join Date
    May 2007
    Posts
    8

    Default

    I get the same primary key duplication issue with few jira backup xmls even taken just after data consistency verification. The consistency verification is performed through the JIRA interface button. The few tries I did to load data into mysql I get the error at the same point with the same data value.

    Any other way to verify JIRA data consistency? Any idea how can I workaround/fix the problem?

    And just a suggestion for the documentation. To get Project IDs from the jira.xml you can use the ugly command below:
    cat jira.xml | sed -e 's/.*\(<Project \).*\(id="\)\([0-9]\+\)".*/\3/p' -e 'd' | sort

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.