Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Splitting Large XML Files

  1. #1
    Join Date
    Sep 2008

    Talking Splitting Large XML Files

    Hi All,

    I need to find a way to split a large xml file into smaller files.

    The xml file being used is a database dump from the JIRA issue tracking system.

    When the file is generated each row of a table within the JIRA DB is added to the file as an xml entity with the name of the table as its entity name and all of the fields of the table as its properties.

    <Action id="" issue="" author="" type="comment" body="" created="" updateauthor="" updated=""/>

    What I would like to do is to create a process where the large file is split up into smaller files.

    I stared by creating a director xml file that contains a list of entity names so the process would find all entities of the same name from the JIRA XML file and output them into a new file, thus creating 70 smaller xml files.

    <xml_item id="001" entity="Action" directory="001-Action-xml-import" filename="actions_proxy.xml"/>

    So by using the details in the above entry in the director xml file, what should happen is all xml entries in the JIRA DB Xml dump that are called "Action" should be found and sent to an output file called “actions_proxy.xml”

    I have read on other forum threads that looping is not done within Kettle and I am now wondering if there is a way of doing this.

    Thx in advance

  2. #2
    Join Date
    May 2006


    Wrong tool I think ... you can probably do it with some major work-arounds and lots of time.

    If it's a one-off migration, write a nice Perl or Python script as dirty hack.


  3. #3
    Join Date
    Sep 2008


    LOL :-D I thought as much no problem I'll write third party script for it.

    The rest of the tool is pure genius thx Pentaho.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.