Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: General Streaming XML reader

  1. #1
    Marc Marschner Guest

    Default General Streaming XML reader

    Hello,



    I would like to know if there is currently any development done in the
    direction of a general XML streaming reader. To my knowledge such a
    component existed already, but was removed from the releases. Could you tell
    me the reason?



    We are currently working on fairly large XML files and use customized
    components employing the stax api to read the data. We are investigating
    methods to make these components more general (and also share them, once
    they reached a functional state), but do not want to reinvent the wheel, if
    somebody already has/is working toward a similar solution.



    Best Regards,



    Marc Marschner


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  2. #2
    Jens Bleuel Guest

    Default Re: General Streaming XML reader

    Hi Marc,

    with the existing Get Data from XML step you are able to read big files
    also. We deprecated the "old" XML steps because this new step does all
    what we need - if not, please go ahead with your appraoch and we can
    check how to improve this :-)

    Please see:
    http://wiki.pentaho.com/display/EAI/Get+Data+From+XML

    Handling large files:
    http://wiki.pentaho.com/display/EAI/...ng+Large+Files

    HTH - All the best,
    Jens

    Marc Marschner schrieb:
    > Hello,
    >
    >
    >
    > I would like to know if there is currently any development done in the
    > direction of a general XML streaming reader. To my knowledge such a
    > component existed already, but was removed from the releases. Could you
    > tell me the reason?
    >
    >
    >
    > We are currently working on fairly large XML files and use customized
    > components employing the stax api to read the data. We are investigating
    > methods to make these components more general (and also share them, once
    > they reached a functional state), but do not want to reinvent the wheel,
    > if somebody already has/is working toward a similar solution.
    >
    >
    >
    > Best Regards,
    >
    >
    >
    > Marc Marschner
    >
    >
    > >


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  3. #3
    Marc Marschner Guest

    Default RE: General Streaming XML reader

    Hello Jens,
    From what I can read from the pages you linked, this describes exactly what
    we had in mind. Thank you very much.
    Marc

    -----Original Message-----
    From: kettle-developers (AT) googlegroups (DOT) com
    [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Jens Bleuel
    Sent: maandag 21 september 2009 18:31
    To: kettle-developers (AT) googlegroups (DOT) com
    Subject: Re: General Streaming XML reader


    Hi Marc,

    with the existing Get Data from XML step you are able to read big files
    also. We deprecated the "old" XML steps because this new step does all
    what we need - if not, please go ahead with your appraoch and we can
    check how to improve this :-)

    Please see:
    http://wiki.pentaho.com/display/EAI/Get+Data+From+XML

    Handling large files:
    http://wiki.pentaho.com/display/EAI/...ng+Large+Files

    HTH - All the best,
    Jens

    Marc Marschner schrieb:
    > Hello,
    >
    >
    >
    > I would like to know if there is currently any development done in the
    > direction of a general XML streaming reader. To my knowledge such a
    > component existed already, but was removed from the releases. Could you
    > tell me the reason?
    >
    >
    >
    > We are currently working on fairly large XML files and use customized
    > components employing the stax api to read the data. We are investigating
    > methods to make these components more general (and also share them, once
    > they reached a functional state), but do not want to reinvent the wheel,
    > if somebody already has/is working toward a similar solution.
    >
    >
    >
    > Best Regards,
    >
    >
    >
    > Marc Marschner
    >
    >
    > >





    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  4. #4
    Wessel Heringa Guest

    Default Re: General Streaming XML reader

    I've been working with the 'get data from xml' component together with
    Marc and we found a problem using the prune-path on large xml files
    with a specific markup.
    As far as we know the "get data from xml" component can handle large
    xml files by using the stax java parser. In the GUI it is possible to
    select the "prune path". By specifying this path the parser cuts the
    xml file in pieces, resulting in better performance. For example, a
    large xml file of the following structure:

    <root>
    <pruneme>
    <product>
    <id>34</id>
    </product>
    </pruneme>
    <pruneme>
    <product>
    <id>35</id>
    </product>
    </pruneme>
    <pruneme>
    <product>
    <id>36</id>
    </product>
    </pruneme>
    ......

    Can be processed quickly when the prunepath is set to /root/pruneme.
    This works very fast.
    But, in our case the xml files have a lot of 'garbage' information:
    Info that is not used
    in this particular instance, but is at the end of the XML file,
    similar to this:

    .....
    <pruneme>
    <product>
    <id>36</id>
    </product>
    </pruneme>
    <garbage>
    <product>
    <name> Name of some product</name>
    </product>
    <anotherproduct>
    <name> Name of some other product</name>
    </anotherproduct>
    <yetanotherproduct>
    <name> Name of yet another product</name>
    </yetanotherproduct>
    ....
    </garbage>

    The size of this 'unused' piece of the XML is about as big as the size
    of the used tags in the xml
    The total size of the XML file is around 700MB.

    We set the prune-path set to /root/prunepath. The transformation/
    component will run through the first part of the XML very fast but it
    hangs or takes a very long time to process the 'garbage' part. We've
    taken a look at the kettle code and it seems the stax parsing is done
    only partially: in "src / kettle / src / org / pentaho / di / trans /
    steps / getxmldata" the GetXMLData class is responsible for handling
    the xml files. The setDocument() method (around line 67) adds an
    eventhandler on the reader, giving the prunepath as an argument. This
    means that the pruning has no added value in parts of the XML that
    does not have the prune-path: Until the parser finds a tag which xpath
    matches the given prunepath, it stores all the encountered nodes in
    memory. This results in the storage of a lot of nodes at the end of
    the file because no prunepath tag is found. In other words, the stax
    parser isn't really used in the last part of the xml file resulting in
    an out-of-memory exception.

    I have a demo xml and ktr file here but it seems it is impossible to
    attach in a google group conversation.

    We've run this demo file on a pentium x64 quad core with 4gb of
    memory, on Windows Vista. Kettle configured to use 4gb of memory. The
    demo xml was 40 MB big (5.7% of the actual used XML file size).
    The transformation was completed after 3000 seconds.
    Using 2gb of memory the transformation was completed after 3400
    seconds

    Are you familiar with this behaviour and is there some development
    going on on this compoent?

    Wessel







    On Sep 22, 2:12

  5. #5
    Jens Bleuel Guest

    Default Re: General Streaming XML reader

    Hi Wessel,

    nice finding, this sounds good as a feature request that can be entered
    here:

    http://jira.pentaho.com

    We could call this something like "Handle large unbalanced XML files".
    You can also attach a compressed test file to the JIRA case - this must
    not a full blown example for testing :-)

    If you are a customer, please log this in the customer support portal.

    Thanks & best regards,
    Jens


    Wessel Heringa schrieb:
    > I've been working with the 'get data from xml' component together with
    > Marc and we found a problem using the prune-path on large xml files
    > with a specific markup.
    > As far as we know the "get data from xml" component can handle large
    > xml files by using the stax java parser. In the GUI it is possible to
    > select the "prune path". By specifying this path the parser cuts the
    > xml file in pieces, resulting in better performance. For example, a
    > large xml file of the following structure:
    >
    > <root>
    > <pruneme>
    > <product>
    > <id>34</id>
    > </product>
    > </pruneme>
    > <pruneme>
    > <product>
    > <id>35</id>
    > </product>
    > </pruneme>
    > <pruneme>
    > <product>
    > <id>36</id>
    > </product>
    > </pruneme>
    > .....
    >
    > Can be processed quickly when the prunepath is set to /root/pruneme.
    > This works very fast.
    > But, in our case the xml files have a lot of 'garbage' information:
    > Info that is not used
    > in this particular instance, but is at the end of the XML file,
    > similar to this:
    >
    > ....
    > <pruneme>
    > <product>
    > <id>36</id>
    > </product>
    > </pruneme>
    > <garbage>
    > <product>
    > <name> Name of some product</name>
    > </product>
    > <anotherproduct>
    > <name> Name of some other product</name>
    > </anotherproduct>
    > <yetanotherproduct>
    > <name> Name of yet another product</name>
    > </yetanotherproduct>
    > ....
    > </garbage>
    >
    > The size of this 'unused' piece of the XML is about as big as the size
    > of the used tags in the xml
    > The total size of the XML file is around 700MB.
    >
    > We set the prune-path set to /root/prunepath. The transformation/
    > component will run through the first part of the XML very fast but it
    > hangs or takes a very long time to process the 'garbage' part. We've
    > taken a look at the kettle code and it seems the stax parsing is done
    > only partially: in "src / kettle / src / org / pentaho / di / trans /
    > steps / getxmldata" the GetXMLData class is responsible for handling
    > the xml files. The setDocument() method (around line 67) adds an
    > eventhandler on the reader, giving the prunepath as an argument. This
    > means that the pruning has no added value in parts of the XML that
    > does not have the prune-path: Until the parser finds a tag which xpath
    > matches the given prunepath, it stores all the encountered nodes in
    > memory. This results in the storage of a lot of nodes at the end of
    > the file because no prunepath tag is found. In other words, the stax
    > parser isn't really used in the last part of the xml file resulting in
    > an out-of-memory exception.
    >
    > I have a demo xml and ktr file here but it seems it is impossible to
    > attach in a google group conversation.
    >
    > We've run this demo file on a pentium x64 quad core with 4gb of
    > memory, on Windows Vista. Kettle configured to use 4gb of memory. The
    > demo xml was 40 MB big (5.7% of the actual used XML file size).
    > The transformation was completed after 3000 seconds.
    > Using 2gb of memory the transformation was completed after 3400
    > seconds
    >
    > Are you familiar with this behaviour and is there some development
    > going on on this compoent?
    >
    > Wessel
    >
    >
    >
    >
    >
    >
    >
    > On Sep 22, 2:12 pm, "Marc Marschner" <marc.marsch... (AT) fredhopper (DOT) com>
    > wrote:
    >> Hello Jens,
    >> From what I can read from the pages you linked, this describes exactly what
    >> we had in mind. Thank you very much.
    >> Marc
    >>
    >> -----Original Message-----
    >> From: kettle-developers (AT) googlegroups (DOT) com
    >>
    >> [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Jens Bleuel
    >> Sent: maandag 21 september 2009 18:31
    >> To: kettle-developers (AT) googlegroups (DOT) com
    >> Subject: Re: General Streaming XML reader
    >>
    >> Hi Marc,
    >>
    >> with the existing Get Data from XML step you are able to read big files
    >> also. We deprecated the "old" XML steps because this new step does all
    >> what we need - if not, please go ahead with your appraoch and we can
    >> check how to improve this :-)
    >>
    >> Please see:http://wiki.pentaho.com/display/EAI/Get+Data+From+XML
    >>
    >> Handling large files:http://wiki.pentaho.com/display/EAI/...+Handling+Larg...
    >>
    >> HTH - All the best,
    >> Jens
    >>
    >> Marc Marschner schrieb:
    >>> Hello,
    >>> I would like to know if there is currently any development done in the
    >>> direction of a general XML streaming reader. To my knowledge such a
    >>> component existed already, but was removed from the releases. Could you
    >>> tell me the reason?
    >>> We are currently working on fairly large XML files and use customized
    >>> components employing the stax api to read the data. We are investigating
    >>> methods to make these components more general (and also share them, once
    >>> they reached a functional state), but do not want to reinvent the wheel,
    >>> if somebody already has/is working toward a similar solution.
    >>> Best Regards,
    >>> Marc Marschner

    > >



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  6. #6
    dontcare Guest

    Default Re: General Streaming XML reader

    There is also a new XML procesisng model called VTD-XML, it is more
    power and advanced than SAX..
    if you experience any performance issues, or want to process large XML
    document using XPath,
    check it out

    http://vtd-xml.sf.net

    On Oct 26, 9:05 am, Jens Bleuel <jble... (AT) pentaho (DOT) com> wrote:
    > Hi Wessel,
    >
    > nice finding, this sounds good as a feature request that can be entered
    > here:
    >
    > http://jira.pentaho.com
    >
    > We could call this something like "Handle large unbalanced XML files".
    > You can also attach a compressed test file to the JIRA case - this must
    > not a full blown example for testing :-)
    >
    > If you are a customer, please log this in the customer support portal.
    >
    > Thanks & best regards,
    > Jens
    >
    > Wessel Heringa schrieb:
    >
    > > I've been working with the 'get data from xml' component together with
    > > Marc and we found a problem using the prune-path on large xml files
    > > with a specific markup.
    > > As far as we know the "get data from xml" component can handle large
    > > xml files by using the stax java parser. In the GUI it is possible to
    > > select the "prune path". By specifying this path the parser cuts the
    > > xml file in pieces, resulting in better performance. For example, a
    > > large xml file of the following structure:

    >
    > > <root>
    > > <pruneme>
    > > <product>
    > > <id>34</id>
    > > </product>
    > > </pruneme>
    > > <pruneme>
    > > <product>
    > > <id>35</id>
    > > </product>
    > > </pruneme>
    > > <pruneme>
    > > <product>
    > > <id>36</id>
    > > </product>
    > > </pruneme>
    > > .....

    >
    > > Can be processed quickly when the prunepath is set to /root/pruneme.
    > > This works very fast.
    > > But, in our case the xml files have a lot of 'garbage' information:
    > > Info that is not used
    > > in this particular instance, but is at the end of the XML file,
    > > similar to this:

    >
    > > ....
    > > <pruneme>
    > > <product>
    > > <id>36</id>
    > > </product>
    > > </pruneme>
    > > <garbage>
    > > <product>
    > > <name> Name of some product</name>
    > > </product>
    > > <anotherproduct>
    > > <name> Name of some other product</name>
    > > </anotherproduct>
    > > <yetanotherproduct>
    > > <name> Name of yet another product</name>
    > > </yetanotherproduct>
    > > ....
    > > </garbage>

    >
    > > The size of this 'unused' piece of the XML is about as big as the size
    > > of the used tags in the xml
    > > The total size of the XML file is around 700MB.

    >
    > > We set the prune-path set to /root/prunepath. The transformation/
    > > component will run through the first part of the XML very fast but it
    > > hangs or takes a very long time to process the 'garbage' part. We've
    > > taken a look at the kettle code and it seems the stax parsing is done
    > > only partially: in "src / kettle / src / org / pentaho / di / trans /
    > > steps / getxmldata" the GetXMLData class is responsible for handling
    > > the xml files. The setDocument() method (around line 67) adds an
    > > eventhandler on the reader, giving the prunepath as an argument. This
    > > means that the pruning has no added value in parts of the XML that
    > > does not have the prune-path: Until the parser finds a tag which xpath
    > > matches the given prunepath, it stores all the encountered nodes in
    > > memory. This results in the storage of a lot of nodes at the end of
    > > the file because no prunepath tag is found. In other words, the stax
    > > parser isn't really used in the last part of the xml file resulting in
    > > an out-of-memory exception.

    >
    > > I have a demo xml and ktr file here but it seems it is impossible to
    > > attach in a google group conversation.

    >
    > > We've run this demo file on a pentium x64 quad core with 4gb of
    > > memory, on Windows Vista. Kettle configured to use 4gb of memory. The
    > > demo xml was 40 MB big (5.7% of the actual used XML file size).
    > > The transformation was completed after 3000 seconds.
    > > Using 2gb of memory the transformation was completed after 3400
    > > seconds

    >
    > > Are you familiar with this behaviour and is there some development
    > > going on on this compoent?

    >
    > > Wessel

    >
    > > On Sep 22, 2:12 pm, "Marc Marschner" <marc.marsch... (AT) fredhopper (DOT) com>
    > > wrote:
    > >> Hello Jens,
    > >> From what I can read from the pages you linked, this describes exactly what
    > >> we had in mind. Thank you very much.
    > >> Marc

    >
    > >> -----Original Message-----
    > >> From: kettle-developers (AT) googlegroups (DOT) com

    >
    > >> [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Jens Bleuel
    > >> Sent: maandag 21 september 2009 18:31
    > >> To: kettle-developers (AT) googlegroups (DOT) com
    > >> Subject: Re: General Streaming XML reader

    >
    > >> Hi Marc,

    >
    > >> with the existing Get Data from XML step you are able to read big files
    > >> also. We deprecated the "old" XML steps because this new step does all
    > >> what we need - if not, please go ahead with your appraoch and we can
    > >> check how to improve this :-)

    >
    > >> Please see:http://wiki.pentaho.com/display/EAI/Get+Data+From+XML

    >
    > >> Handling large files:http://wiki.pentaho.com/display/EAI/...+Handling+Larg...

    >
    > >> HTH - All the best,
    > >> Jens

    >
    > >> Marc Marschner schrieb:
    > >>> Hello,
    > >>> I would like to know if there is currently any development done in the
    > >>> direction of a general XML streaming reader. To my knowledge such a
    > >>> component existed already, but was removed from the releases. Could you
    > >>> tell me the reason?
    > >>> We are currently working on fairly large XML files and use customized
    > >>> components employing the stax api to read the data. We are investigating
    > >>> methods to make these components more general (and also share them, once
    > >>> they reached a functional state), but do not want to reinvent the wheel,
    > >>> if somebody already has/is working toward a similar solution.
    > >>> Best Regards,
    > >>> Marc Marschner


    --

    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  7. #7
    Jens Bleuel Guest

    Default 3.2.3 branch closes, 3.2.4 opened

    FYI: 3.2.3 branch closes, 3.2.4 opened.

    Cheers,
    Jens

    --

    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  8. #8
    Matt Casters Guest

    Default Re: General Streaming XML reader

    Very interesting. It's unfortunate that it's licensed under the GPL but I guess someone could write a plugin ;-)

    Matt Casters <mcasters (AT) pentaho (DOT) org>
    Chief Data Integration
    Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    Pentaho : The Commercial Open Source Alternative for Business Intelligence



    On Tuesday 24 November 2009 22:14:54 dontcare wrote:
    > There is also a new XML procesisng model called VTD-XML, it is more
    > power and advanced than SAX..
    > if you experience any performance issues, or want to process large XML
    > document using XPath,
    > check it out
    >
    > http://vtd-xml.sf.net
    >
    > On Oct 26, 9:05 am, Jens Bleuel <jble... (AT) pentaho (DOT) com> wrote:
    > > Hi Wessel,
    > >
    > > nice finding, this sounds good as a feature request that can be entered
    > > here:
    > >
    > > http://jira.pentaho.com
    > >
    > > We could call this something like "Handle large unbalanced XML files".
    > > You can also attach a compressed test file to the JIRA case - this must
    > > not a full blown example for testing :-)
    > >
    > > If you are a customer, please log this in the customer support portal.
    > >
    > > Thanks & best regards,
    > > Jens
    > >
    > > Wessel Heringa schrieb:
    > >
    > > > I've been working with the 'get data from xml' component together with
    > > > Marc and we found a problem using the prune-path on large xml files
    > > > with a specific markup.
    > > > As far as we know the "get data from xml" component can handle large
    > > > xml files by using the stax java parser. In the GUI it is possible to
    > > > select the "prune path". By specifying this path the parser cuts the
    > > > xml file in pieces, resulting in better performance. For example, a
    > > > large xml file of the following structure:

    > >
    > > > <root>
    > > > <pruneme>
    > > > <product>
    > > > <id>34</id>
    > > > </product>
    > > > </pruneme>
    > > > <pruneme>
    > > > <product>
    > > > <id>35</id>
    > > > </product>
    > > > </pruneme>
    > > > <pruneme>
    > > > <product>
    > > > <id>36</id>
    > > > </product>
    > > > </pruneme>
    > > > .....

    > >
    > > > Can be processed quickly when the prunepath is set to /root/pruneme.
    > > > This works very fast.
    > > > But, in our case the xml files have a lot of 'garbage' information:
    > > > Info that is not used
    > > > in this particular instance, but is at the end of the XML file,
    > > > similar to this:

    > >
    > > > ....
    > > > <pruneme>
    > > > <product>
    > > > <id>36</id>
    > > > </product>
    > > > </pruneme>
    > > > <garbage>
    > > > <product>
    > > > <name> Name of some product</name>
    > > > </product>
    > > > <anotherproduct>
    > > > <name> Name of some other product</name>
    > > > </anotherproduct>
    > > > <yetanotherproduct>
    > > > <name> Name of yet another product</name>
    > > > </yetanotherproduct>
    > > > ....
    > > > </garbage>

    > >
    > > > The size of this 'unused' piece of the XML is about as big as the size
    > > > of the used tags in the xml
    > > > The total size of the XML file is around 700MB.

    > >
    > > > We set the prune-path set to /root/prunepath. The transformation/
    > > > component will run through the first part of the XML very fast but it
    > > > hangs or takes a very long time to process the 'garbage' part. We've
    > > > taken a look at the kettle code and it seems the stax parsing is done
    > > > only partially: in "src / kettle / src / org / pentaho / di / trans /
    > > > steps / getxmldata" the GetXMLData class is responsible for handling
    > > > the xml files. The setDocument() method (around line 67) adds an
    > > > eventhandler on the reader, giving the prunepath as an argument. This
    > > > means that the pruning has no added value in parts of the XML that
    > > > does not have the prune-path: Until the parser finds a tag which xpath
    > > > matches the given prunepath, it stores all the encountered nodes in
    > > > memory. This results in the storage of a lot of nodes at the end of
    > > > the file because no prunepath tag is found. In other words, the stax
    > > > parser isn't really used in the last part of the xml file resulting in
    > > > an out-of-memory exception.

    > >
    > > > I have a demo xml and ktr file here but it seems it is impossible to
    > > > attach in a google group conversation.

    > >
    > > > We've run this demo file on a pentium x64 quad core with 4gb of
    > > > memory, on Windows Vista. Kettle configured to use 4gb of memory. The
    > > > demo xml was 40 MB big (5.7% of the actual used XML file size).
    > > > The transformation was completed after 3000 seconds.
    > > > Using 2gb of memory the transformation was completed after 3400
    > > > seconds

    > >
    > > > Are you familiar with this behaviour and is there some development
    > > > going on on this compoent?

    > >
    > > > Wessel

    > >
    > > > On Sep 22, 2:12 pm, "Marc Marschner" <marc.marsch... (AT) fredhopper (DOT) com>
    > > > wrote:
    > > >> Hello Jens,
    > > >> From what I can read from the pages you linked, this describes exactly what
    > > >> we had in mind. Thank you very much.
    > > >> Marc

    > >
    > > >> -----Original Message-----
    > > >> From: kettle-developers (AT) googlegroups (DOT) com

    > >
    > > >> [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Jens Bleuel
    > > >> Sent: maandag 21 september 2009 18:31
    > > >> To: kettle-developers (AT) googlegroups (DOT) com
    > > >> Subject: Re: General Streaming XML reader

    > >
    > > >> Hi Marc,

    > >
    > > >> with the existing Get Data from XML step you are able to read big files
    > > >> also. We deprecated the "old" XML steps because this new step does all
    > > >> what we need - if not, please go ahead with your appraoch and we can
    > > >> check how to improve this :-)

    > >
    > > >> Please see:http://wiki.pentaho.com/display/EAI/Get+Data+From+XML

    > >
    > > >> Handling large files:http://wiki.pentaho.com/display/EAI/...+Handling+Larg...

    > >
    > > >> HTH - All the best,
    > > >> Jens

    > >
    > > >> Marc Marschner schrieb:
    > > >>> Hello,
    > > >>> I would like to know if there is currently any development done in the
    > > >>> direction of a general XML streaming reader. To my knowledge such a
    > > >>> component existed already, but was removed from the releases. Could you
    > > >>> tell me the reason?
    > > >>> We are currently working on fairly large XML files and use customized
    > > >>> components employing the stax api to read the data. We are investigating
    > > >>> methods to make these components more general (and also share them, once
    > > >>> they reached a functional state), but do not want to reinvent the wheel,
    > > >>> if somebody already has/is working toward a similar solution.
    > > >>> Best Regards,
    > > >>> Marc Marschner

    >
    > --
    >
    > You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com.
    > For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.
    >
    >
    >
    >


    --

    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) googlegroups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.