Hitachi Vantara Pentaho Community Forums
Results 1 to 18 of 18

Thread: Any Streaming XML Input gurus here?

  1. #1

    Default Any Streaming XML Input gurus here?

    In the attached XML document I am after the following fields:

    Found in GISF_ENTITY/ENTITY
    ENTITY_ID
    ENTITY_PUBLISHED_NAME

    Found in GISF_ENTITY/ENTITY/ENTITY_SECTOR
    SECTOR

    Found in (And could be multiple rows which need to be combined into a new rowset) GISF_ENTITY/ENTITY/ENTITY_RATING_GROUP/ENTITY_RATINGS

    TYPE_OF_RATING
    CREDIT_RATING
    LONG_TERM_RATING_DATE
    LONG_TERM_RATING
    LONG_TERM_CREDITWATCH
    LONG_TERM_OUTLOOK
    SHORT_TERM_RATING_DATE
    SHORT_TERM_RATING
    SHORT_TERM_CREDITWATCH
    SHORT_TERM_OUTLOOK

    The trouble is that I do not want the various credit ratings for an Entity to be additional fields, been playing with the location and Elements on the fields tab and cannot seem to get it to work. <ENTITY_RATINGS> could be more than 2 but most of the time just two subsets.

    Down to this last challenge in my feasibility study, which if successful should sell them on this product.

    Thanks in advance.

    Marc Pike
    Attached Files Attached Files
    Last edited by marc.pike; 11-09-2007 at 06:50 PM. Reason: clarity

  2. #2

    Post Xml

    something like this (see attached) should solve yourp pb ??

    Rgds

    Samatar
    Last edited by shassan2; 05-15-2008 at 12:33 PM.

  3. #3

    Default

    Thanks so much Samatar, but I wonder if this will solve all my issues...

    The ENTITY_RATINGS_GROUP is in some cases a repeating band, meaning, I could have multiple TYPE_OF_RATING values per each ENTITY.

    If you could tell me how to setup this repeating logic, that would be awesome.

    In my attempts I have two Location(s) setup:

    Ep=GISF_ENTITY/1
    Ep=ENTITY/1

    I also have no elements setup in the Fields tab, but I am only getting the first ENTITY_RATINGS_GROUP of each ENTITY and in I need to get each repeating one.

    Thanks again for your help.

    Marc Pike

  4. #4

    Smile

    Hi Marc,

    See attached picture :-)

    Rgds

    Samatar
    Last edited by shassan2; 01-25-2008 at 01:53 PM.

  5. #5

    Default

    Samatar, what transformation is that?

    I am trying to use a Streaming XML Input, the example I provided is only a few rows, I have over 400mb of files that I have to import, so I have to rely on Sax.

    Also, those images are too fuzzy when I try to view them.

    mpike@romecorp.com is my email addy if this is easier for you.

    Thanks for the help, trying to avoid having to go the XSL route if possible.

    Regards,

    Marc Pike

  6. #6

    Default

    Also, my XML Input and Streaming XML Input does not look anything like that...

  7. #7

    Smile

    Hi Marc,

    It's not XML Input or XML input stream. :-)
    It's an other step.

    In the plugins section, you had a XMLInputPath Plugin step to download (only at this time for 2.5.x).

    http://wiki.pentaho.org/display/EAI/...ation+Plug-Ins


    This step use XPath to extract data from a file (by looping on a Xpath).

    In you case, the Xpath is : GISF_ENTITY/ENTITY/ENTITY_RATING_GROUP/ENTITY_RATINGS

    ------

    I made some changes on the XML Input Step :
    - added it the ability to parse any XML stream (from field created in a previous step)
    - check or not validate XML
    - check or not namespaces aware
    ...

    That's why i created a new plugin called "getXMLData".

    At this time, only available for PDI 2.5.x serie.

    It's not very hard to port it to 3.0.

    Rgds

    Samatar

  8. #8

    Default

    What is involved to port it to 3.0?

    Thanks,

    Marc Pike

  9. #9

    Default

    BTW, I downloaded the source, compiled it and deployed it and Kettle doesn't load, I realize there must be more to converting a 2.5 plugin to 3.0 that simply compiling it, but if you have the time I sure would like to have this step in 3.0.

    Regards,

    Marc Pike

  10. #10

    Default

    Another conclusion, Kettle has no ability to loop on sub-elements either using the Streaming XML Input or XML Input right?

    Except for this custom step?

    I am surprised at that, I am sure it is hard, but XML files are generally not flat-files right?

    Thanks in advance.

    Marc Pike

  11. #11

    Default

    Hi Marc,
    I will port it to 3.0 (already planned) :-)
    As you know, there can be many XML processing methods :
    XML input, XML input Sax cover many needs...getXMLdata (that use Xpath) will probably
    cover some needs...

    I did not really used thoses step in the past..so i am not the right person to talk about it.

    anyway, for the getXMLData, i will produce Plugin for 3.0 and post it( i will probably send to you
    in your private mail if you don't mind).

    Take care

    Samatar

  12. #12

    Default

    Would like that Samatar, thanks a lot!

    Does your Transformation handle large files?

    Regards,

    Marc Pike

  13. #13

    Default

    Does your Transformation handle large files?
    --> Should do the trick..

    Samatar

  14. #14
    Join Date
    Oct 2007
    Posts
    16

    Default

    HI Samatar/Mike,

    I am exactly in the same position as Mike. I am evaluating Kettle and XML files are extremely important to us. I am also looking in Samatar's XML Input Path (using 2.5)

    I wanted to add something to Mike's example:

    Mike's data can be retrieved from the XML file with 3, not one Xpaths

    1. /GIFS_ENTITY/ENTITY/ENTITY_ID
    2. /GIFS_ENTITY/ENTITY/ENTITY_ID/SECTOR and
    3. /GIFS_ENTITY/ENTITY_RATING_GROUP//ENTITY_RATINGS.

    So basically he needs to get rows like the following:

    EntityId EntityPublishedName Sector TypeofRating etc...
    1894 Mickey Mouse Globiss Local Curency etc...

    1894 Mickey Mouse Globiss Foreign Currency etc...

    It is in a way like "denormalizing" the XML document.

    I would really appreciate if you could let me know whether this is possible to acheive and how, because my data is all about "denormalizing" XML's.

    Thanks and sorry for the interference.

    magi

  15. #15

    Default

    Would like to know this as well, good question...

    Marc Pike

  16. #16
    Join Date
    Dec 2007
    Posts
    5

    Default

    Hello,
    have you succeeded to run this plug-in in 3.0?
    regards
    Nara Shiri

  17. #17
    Join Date
    May 2006
    Posts
    4,882

    Default

    No... steps require a conversion from 2.5 to 3.0 and as you can see on http://wiki.pentaho.org/display/EAI/...ation+Plug-Ins the xml input path is not yet converted (on december 5th 2007 anyway).

    Regards,
    Sven

  18. #18
    Join Date
    Dec 2007
    Posts
    5

    Default

    Ok, thanks.
    I've just thought this was a matter of some small changes in sources.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.