Hitachi Vantara Pentaho Community Forums
Results 1 to 22 of 22

Thread: help! xml input

  1. #1

    Default help! xml input

    How do I pull out all the trip records in the following structure?

    <database>
    <DataSet>
    <Trip>....</Trip>
    <Trip>....</Trip>
    </DataSet>
    <DataSet>
    <Trip>....</Trip>
    <Trip>....</Trip>
    </DataSet>
    </database>

    Is it even possible?

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    If there are always 2 Trips I think so. Else it depends... supply a real part of such an input file.

    The main thing in Kettle is that all rows have to be of the same structure (think of database tables)... you can have a variable number of columns in a row.

    Regards,
    Sven

  3. #3

    Default

    Here's an example. Note that the number of fields will vary - for example there are actual.values instead of estimated ones if the trip has completed.

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <database><DataSet>
    <Trip>
    <name>2047</name>
    <startDate>2007-10-02T00:00Z</startDate>
    <routingCentre>TH10</routingCentre>
    <driver>Th_18t_04</driver>
    <vehicle>Th_18t_04</vehicle>
    <logon>2007-10-02T04:46Z</logon>
    <travelCost>81.999291301</travelCost>
    <collectedWeight>0</collectedWeight>
    <collectedVolume>0</collectedVolume>
    <deliveredWeight>2277.531</deliveredWeight>
    <deliveredVolume>5033.201</deliveredVolume>
    <areaSummary>Ryde, Sandown, Bembridge</areaSummary>
    <postcodeSummary>PO30-31,33,35-36; TW20</postcodeSummary>
    <productSummary>AC </productSummary>
    <realDriver>Palmer Neil</realDriver>
    <realVehicle></realVehicle>
    <tripStatus>Started</tripStatus>
    <deliveriesMade>0</deliveriesMade>
    <goodDeliveries>0</goodDeliveries>
    <failedDeliveries>0</failedDeliveries>
    <lateDeliveries>0</lateDeliveries>
    <estimatedLateDeliveries>0</estimatedLateDeliveries>
    <lastDataReceivedAt>2007-10-02T04:51Z</lastDataReceivedAt>
    <smsStatus></smsStatus>
    <plannedDriver></plannedDriver></Trip>
    <Visit>
    <trip>2047</trip>
    <site>221667</site>
    <relativePosition>0</relativePosition>
    <visitTimeStatus>Behind</visitTimeStatus>
    <visitCompletionStatus>Planned</visitCompletionStatus>
    <driverWillReturn>false</driverWillReturn>
    <x></x>
    <y></y>
    <productSummary>AC </productSummary>
    <failedAt></failedAt>
    <driverComments></driverComments>
    <earliestArrival>2007-10-02T00:00Z</earliestArrival>
    <latestArrival>2007-10-02T23:59Z</latestArrival></Visit>
    </DataSet>
    <DataSet>
    <Trip>
    <name>2047</name>
    <startDate>2007-10-02T00:00Z</startDate>
    <routingCentre>TH10</routingCentre>
    <estimated.start>2007-10-02T04:46Z</estimated.start>
    <estimated.finish>2007-10-02T15:59:37,999Z</estimated.finish>
    <estimated.departDepotAt>2007-10-02T04:51Z</estimated.departDepotAt>
    <estimated.returnDepotAt>2007-10-02T15:54:37,999Z</estimated.returnDepotAt>
    <estimated.distance>286606</estimated.distance>
    <estimated.nonDriveWorking>PT3H24M</estimated.nonDriveWorking>
    <estimated.driving>PT7H49M37,999S</estimated.driving>
    <estimated.timeOnBreak>PT0S</estimated.timeOnBreak>
    <estimated.waiting>PT0S</estimated.waiting></Trip>

    </DataSet></database>

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    I can get 1 trip out of it by setting the elements to database, DataSet, Trip.

    Maybe raise a low-level JIRA anyway and add a file with some more entries.

    Regards,
    Sven

  5. #5

    Default

    Yes but the second trip is the most important :-(

    do we expect this to be ok with the xpath xml input stage. It should be a matter of asking for //Trip or something like that.
    Secondly, what's the state of 3.1 in svn. Is the xpath version there? Is it worth me trying it out?

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    It's ok with the xpath xml step, only the get fields functionality doesn't see your second trip properly but if you add them manually it works.

    In 3.1 besides some new steps no major changes were introduced yet, so for the moment (31Dev2007) it's still pretty stable.

    You can now also find extra plugins at http://wiki.pentaho.org/display/EAI/...ation+Plug-Ins

    Regards,
    Sven

  7. #7

    Post

    Hi tim,
    I tested with your sample and it works fine (please see attached picture).
    The getXMLData step is available in 3.1.
    For 2.5.X, there is a plugin available :
    see
    http://wiki.pentaho.org/display/EAI/...ation+Plug-Ins

    Rgds

    Samatar

  8. #8

    Default

    Thanks guys. Just got the latest version. Can you confirm I don't need to build it if I'm not changing it?

  9. #9
    Join Date
    May 2006
    Posts
    4,882

    Default

    If you extract from SVN you need to build. JKD5 or higher ... and run "ant" in the trunk. the result will be in the distrib directory.

    Regards,
    Sven

  10. #10

    Default

    Is this my fault or an error in the svn?

    [javac] Compiling 336 source files to C:\programmingExperiments\kettle3.1\classes-ui
    [javac] C:\programmingExperiments\kettle3.1\src-ui\org\pentaho\di\ui\trans\steps\setvariable\SetVariableDialog.java:208: cannot find symbol
    [javac] symbol : method getDefaultValue()
    [javac] location: class org.pentaho.di.trans.steps.setvariable.SetVariableMeta
    [javac] String tvv = input.getDefaultValue()[i];
    [javac] ^
    [javac] C:\programmingExperiments\kettle3.1\src-ui\org\pentaho\di\ui\trans\steps\setvariable\SetVariableDialog.java:244: cannot find symbol
    [javac] symbol : method getDefaultValue()
    [javac] location: class org.pentaho.di.trans.steps.setvariable.SetVariableMeta
    [javac] input.getDefaultValue()[i] = item.getText(4);
    [javac] ^

  11. #11

    Default

    It's my fault...can you please re try to extract from SVN.

    Samatar

  12. #12

    Default

    it builds now, thanks. I'll let you know how I get on..

  13. #13

    Default

    Nice!
    You wouldn't consider parsing ISO standard dates in the xml would you? It could be quite handy

  14. #14
    Join Date
    May 2006
    Posts
    4,882

    Default

    Use a conversion mask, you put whatever you want ask date mask.

    Regards,
    Sven

  15. #15

    Default

    Ive just tried
    My ISO dates can be either
    2007-10-02T00:00Z
    or
    2007-10-02T19:45:03Z
    but the precision can vary, and they could be missing the time section altogether.

    But even when I know I've got an exact match and put in the string

    yyyy-MM-ddThh:mmZ

    as my format I get the error message:

    start_date String : couldn't convert string [2007-10-02T00:00Z] to a date using format [yyyy/MM/dd HH:mm:ss.SSS]

    Because the ISO format is variable in layout you probably need to parse it explicitly rather than using a mask. 2007-10-02 would be equally acceptable for example


    On a different matter, spoon 3.1 seems to be on "un-turbo" mode. Is this because it's the svn version or is there some switch somewhere. It's really slow (but good enough for what I'm doing).

  16. #16
    Join Date
    May 2006
    Posts
    4,882

    Default

    The conversion mask you see in the output is the default, that may still be a JIRA (but not for the explicitly parsing).

    For the "un-turbo" mode... compared to what, examples?

    Regards,
    Sven

  17. #17

    Default

    On further testing I think it's just the new xml and probably something to do with xpath

    On the speed, I've 2 versions of the trip conversion that look at only the first set of records. Using the old xml input I pull 69 records out of a 5mb xml file perform a number of steps and write them to the database in 0.3s

    On the new version the same action takes 23 seconds - so something like 60 x slower

  18. #18
    Join Date
    May 2006
    Posts
    4,882

    Default

    Raise a JIRA for the XML xpath step

    Regards,
    Sven

  19. #19

    Default

    I spent a few minutes tinkering with xml path step and I've got the time from 35s (different machine) to 0.6 s by a bit of an abuse of function.
    The problem is that the xpath step uses xpath both to identify the "records" and subsequently it carries out a furthers xpath operations to extract each field from each identified record.

    If the path from the record to the field is complex, this is the "obvious" way to do it - but in computational terms is hugely expensive. Because my records are nice and consistent, I can simply inspect the children of "record" node to extract my data using a straight getChildNodes operation.

    So my huge speedup comes from trashing the generality of the solution for something that works for me.

    And it's an inelegant hack.

    I don't really have the time to do anything neat about this and learn all about dialogs and SWT and stuff so what, if anything, should I do with it?

    Tim

  20. #20
    Join Date
    May 2006
    Posts
    4,882

    Default

    Attach it to the JIRA and quote it as inelegant hack

    Regards,
    Sven

  21. #21

    Default Latest getXMLData much faster

    Latest version from 3.1 is much faster than before

  22. #22

    Default

    Hi Tim,

    We did some changes....
    Do the speed look reasonable for you?

    Rgds

    Samatar

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.