Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Get Data from XML - start on certain line

  1. #1
    Join Date
    Nov 2011
    Posts
    12

    Default Get Data from XML - start on certain line

    I have an extremely large xml file that I am needing to import. Right now I manually break the xml file in to file a and file b. Is there a way to use the 'Get Data From XML' and have it start at a certain line?? If so, that would help greatly. For I can use the limit feature to pull the first 700000 and then I need a way to tell it to start on 700001.

    Thanks

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Have you looked at the StAX XML step?
    http://kettle.bleuel.com/2011/06/24/...tep-in-pdi-42/

    I recall there being a post on Matt's blog about it too, but I can't find it.

    It's supposed to handle very large XML without loading the whole file into memory.

  3. #3
    Join Date
    Nov 2008
    Posts
    777

    Default

    Have you tried to use the "Prune path to handle large files" option on the Content tab of the Get Data from XML step? This is supposed to put the step into Streaming Mode and process large files faster. I see the documentation is not really up to date on this though because that option is not documented at http://wiki.pentaho.com/display/EAI/Get+Data+From+XML. There used to be a separate Streaming XML Input step but I see it is deprecated in PDI 4.1.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  4. #4
    Join Date
    Nov 2008
    Posts
    777

    Default

    I see this step was added in PDI 4.2.

    http://wiki.pentaho.com/display/EAI/...eam+%28StAX%29
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  5. #5
    Join Date
    Nov 2011
    Posts
    12

    Default

    Thanks. We are on 4.1.2 still. I'll look at this once we get on the newer version.

    Thanks again for the help.

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    It was Jens that blogged about it: http://kettle.bleuel.com/2011/06/24/...tep-in-pdi-42/
    The step was tested with XML files ranging in the tens of GBs but there is no reason it couldn't handle bigger files.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.