Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: [ANN]VTD-XML 2.10

  1. #1
    dontcare Guest

    Default [ANN]VTD-XML 2.10

    VTD-XML 2.10 is now released in Java, C#, C and C++. It can be
    downloaded at
    https://sourceforge.net/projects/vtd...mpleware_2.10/.
    This release includes a number of new features and enhancement.

    * The core API of VTD-XML has been expanded. Users can now perform
    cut/paste/insert on an empty element.
    * This release also adds the support of deeper location cache support
    for parsing and indexing. This feature is useful for application
    performance tuning for processing various XML documents.
    * The java version also added support for processing zip and gzip
    files. Direct processing of httpURL based XML is enhanced.
    * Extended Java version now support Iso-8859-10~16 encoding.
    * A full featured C++ port is released.
    * C version of VTD-XML now make use of thread local storage to achieve
    thread safety for multi-threaded application.
    * There are also a number of bugs fixed. Special thanks to Jozef
    Aerts, John Sillers, Chris Tornau and a number of other users for
    input and suggestions

    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  2. #2
    Jens Bleuel Guest

    Default Re: [ANN]VTD-XML 2.10

    I evaluated a lot of XML parsers for very big and complex XML files and
    VTD-XML was not selected since it has limitations in the free GPL
    version, e.g. on the maximum file size:

    For VTD-XML's regular version, it depends on the name space enablement.

    * When namespace is not enabled, the maximum file size is 2 GB.
    * When namespace is enabled, the maximum file size is 1 GB.

    With VTD-XML's extended edition, the supported maximum file size is 256
    GB, regardless of namespace support.

    I found other ways and it is only a matter of some days to publish the
    finished version of a new step for
    http://jira.pentaho.com/browse/PDI-5313
    "XML: Create a new step that is capable of processing very large and
    complex XML files very fast"

    I investigated in different XML parsers and depending on the licensing
    models and processing (in-memory or not), I found that a StAX parser is
    suitable for this. Different implementations exist and tests with the
    Java 6 default are very satisfying. But we should have an option to use
    also others (e.g. Woodstox).

    The design goals are:
    1) very fast and independent of the memory regardless of the file size
    2) very flexible reading different parts of the XML file in different
    ways (and avoid parsing the file many times)

    More will come on the Wiki and my blog soon.

    Thanks & best regards / Vielen Dank und beste Gr

  3. #3
    dontcare Guest

    Default Re: VTD-XML 2.10

    A lof of people we know are using vtd-xml and GPL is not problem to
    them.


    On Mar 1, 1:44

  4. #4
    Matt Casters Guest

    Default Re: VTD-XML 2.10

    The GPL is a fine license for sure.
    However, from the start the Kettle project picked the LGPL license to make
    it easier for other projects (open and closed source) to embed the software..
    Spending a lot of time and effort to change that is not something I would
    like to do in the near nor remote future.

    Regards,

    Matt


    2011/3/4 dontcare <bchang2002 (AT) gmail (DOT) com>

    > A lof of people we know are using vtd-xml and GPL is not problem to
    > them.
    >
    >
    > On Mar 1, 1:44 am, Jens Bleuel <jble... (AT) pentaho (DOT) com> wrote:
    > > I evaluated a lot of XML parsers for very big and complex XML files and
    > > VTD-XML was not selected since it has limitations in the free GPL
    > > version, e.g. on the maximum file size:
    > >
    > > For VTD-XML's regular version, it depends on the name space enablement.
    > >
    > > * When namespace is not enabled, the maximum file size is 2 GB.
    > > * When namespace is enabled, the maximum file size is 1 GB.
    > >
    > > With VTD-XML's extended edition, the supported maximum file size is 256
    > > GB, regardless of namespace support.
    > >
    > > I found other ways and it is only a matter of some days to publish the
    > > finished version of a new step forhttp://

    > jira.pentaho.com/browse/PDI-5313
    > > "XML: Create a new step that is capable of processing very large and
    > > complex XML files very fast"
    > >
    > > I investigated in different XML parsers and depending on the licensing
    > > models and processing (in-memory or not), I found that a StAX parser is
    > > suitable for this. Different implementations exist and tests with the
    > > Java 6 default are very satisfying. But we should have an option to use
    > > also others (e.g. Woodstox).
    > >
    > > The design goals are:
    > > 1) very fast and independent of the memory regardless of the file size
    > > 2) very flexible reading different parts of the XML file in different
    > > ways (and avoid parsing the file many times)
    > >
    > > More will come on the Wiki and my blog soon.
    > >
    > > Thanks & best regards / Vielen Dank und beste Gr��e
    > > Jens
    > >
    > > ---
    > > Jens Bleuelhttp://kettle.bleuel.com
    > >
    > > Am 01.03.2011 10:14, schrieb dontcare:
    > >
    > >
    > >
    > > > VTD-XML 2.10 is now released in Java, C#, C and C++. It can be
    > > > downloaded at
    > > >https://sourceforge.net/projects/vtd...mpleware_2.10/

    > .
    > > > This release includes a number of new features and enhancement.

    > >
    > > > * The core API of VTD-XML has been expanded. Users can now perform
    > > > cut/paste/insert on an empty element.
    > > > * This release also adds the support of deeper location cache support
    > > > for parsing and indexing. This feature is useful for application
    > > > performance tuning for processing various XML documents.
    > > > * The java version also added support for processing zip and gzip
    > > > files. Direct processing of httpURL based XML is enhanced.
    > > > * Extended Java version now support Iso-8859-10~16 encoding.
    > > > * A full featured C++ port is released.
    > > > * C version of VTD-XML now make use of thread local storage to achieve
    > > > thread safety for multi-threaded application.
    > > > * There are also a number of bugs fixed. Special thanks to Jozef
    > > > Aerts, John Sillers, Chris Tornau and a number of other users for
    > > > input and suggestions- Hide quoted text -

    > >
    > > - Show quoted text -

    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "kettle-developers" group.
    > To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    > To unsubscribe from this group, send email to
    > kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    > For more options, visit this group at
    > http://groups.google.com/group/kettle-developers?hl=en.
    >
    >



    --
    Matt Casters <mcasters (AT) pentaho (DOT) org>
    Chief Data Integration, Kettle founder, Author of Pentaho Kettle
    Solutions<http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177>
    (Wiley <http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html>)
    Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
    Pentaho : The Commercial Open Source Alternative for Business Intelligence

    --
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com.
    To unsubscribe from this group, send email to kettle-developers+unsubscribe (AT) g...oups (DOT) com.
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en.

  5. #5
    dontcare Guest

    Default Re: VTD-XML 2.10

    getting back to the other points made by Jens Bleuel, if he enjoys sax
    and stax
    then it is fine, but in my humble view, sax and stax are quite
    difficult to use... for
    extended vtd-xml, you can do xpath (full set) over 256GB document,
    that is not only
    faster, but reduce code size by orders of magniftude


    On Mar 4, 1:01

  6. #6
    Roland Bouman Guest

    Default Re: VTD-XML 2.10

    @dontcare vtd-xml seems really great and I can certainly see the
    benefit of having fast xpath on large xml files. My guess is that if
    the vtd-xml benefits are recognized and people need it enough, someone
    will conjure up a kettle plugin. That way, the plugin can be
    distributed under the GPL, and people can use it for their ETL in
    combination with LGPL kettle, as long as they don't distribute a
    kettle+gpl plugin package.

    At the same time, as Matt pointed out, the GPL does prevent pentaho
    from including vtd-xml in Kettle. So unless kettle changes from LGPL
    to GPL, or vtd-xml/java changes from GPL to LGPL, any discussion
    focused on features and benefits as compared to license compatible
    solutions is IMO moot.

    kind regards,

    Roland

    On Sat, Mar 5, 2011 at 4:23 AM, dontcare <bchang2002 (AT) gmail (DOT) com> wrote:[color=blue]
    > getting back to the other points made by Jens Bleuel, if he enjoys sax
    > and stax
    > then it is fine, but in my humble view, sax and stax are quite
    > difficult to use... for
    > extended vtd-xml, you can do xpath (full set) over 256GB document,
    > that is not only
    > faster, but reduce code size by orders of magniftude
    >
    >
    > On Mar 4, 1:01

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.