Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Get Data from XML using tokens.

  1. #1
    Join Date
    Aug 2013
    Posts
    16

    Default Get Data from XML using tokens.

    Hello all. I am a mechanical engineer turned developer/BA consultant.

    Pentaho is fairly new to me. For the life of me I am unable to properly read this XML file to properly read the URL.

    I am looking to retrieve the from the XML:
    -Name
    -Region Name
    -URL link to the kml

    As I was attempting to do this, I was consistently just getting 12 duplicate instances of the first entry. I've tried to look into tokens but to no avail.

    Please advise.

    thank you!
    STG_GIS_FIRE_LOAD.ktr
    Fire_Hotspot.xml

  2. #2
    Join Date
    Nov 2008
    Posts
    777

    Default

    Another mechanical engineer turned developer...who woulda thunk it!

    The problem you are having is that you have a repeating element (the NetworkLink element) inside another repeating element (the inner Folder element) but the Get Data from XML is set up for only one repeating element. I think the easiest way to handle this is to use sequential Get Data from XML steps and pass the inner repeating XML fragment from the first step to the second step where the inner repeat is processed.
    Attached Files Attached Files
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  3. #3
    Join Date
    Aug 2013
    Posts
    16

    Default

    Thanks so much. I knew it had to do with the nested characteristics of the XML file.

    however, I am now having an issue of moving from the first to second step. I am currently receiving a " Error on line 2 of document : Content is not allowed in prolog. Nested exception: Content is not allowed in prolog." in Spoon...

    any thoughts or guidance on how to address that?

  4. #4
    Join Date
    Nov 2008
    Posts
    777

    Default

    That message usually shows up for me when the Get Data from XML step is not receiving actual xml. It is thus reporting that there is data (content) before the start of the xml elements when in reality there is no xml, just data. Are you sure you configured the first step exactly the way I did in my example? The "folder_xml" field must be set to the "Result type" of "Single node". If I set it to "Value of" I get the same error message you got.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  5. #5
    Join Date
    Aug 2013
    Posts
    16

    Default

    Ok, I got that. Unfortunately, when I download the ktr file that you've created and tried to run that unchanged, I am still getting the error...

    I understand that if the second step isn't receiving XML data but just data that it will fail.

    I'm not sure if it helps but I am using 4.1 GA for Spoon.


    2013/10/16 16:26:19 - Get data from KML Fire Hotspot.0 - Finished processing (I=3, O=0, R=0, W=3, U=0, E=0)
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : Unexpected Error : org.pentaho.di.core.exception.KettleException:
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.dom4j.DocumentException: Error on line 2 of document : Content is not allowed in prolog. Nested exception: Content is not allowed in prolog.
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : Error on line 2 of document : Content is not allowed in prolog. Nested exception: Content is not allowed in prolog.
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.core.exception.KettleException:
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.dom4j.DocumentException: Error on line 2 of document : Content is not allowed in prolog. Nested exception: Content is not allowed in prolog.
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : Error on line 2 of document : Content is not allowed in prolog. Nested exception: Content is not allowed in prolog.
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.trans.steps.getxmldata.GetXMLData.setDocument(GetXMLData.java:174)
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.trans.steps.getxmldata.GetXMLData.ReadNextString(GetXMLData.java:396)
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.trans.steps.getxmldata.GetXMLData.getXMLRowPutRowWithErrorhandling(GetXMLData.java:705)
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.trans.steps.getxmldata.GetXMLData.getXMLRow(GetXMLData.java:691)
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.trans.steps.getxmldata.GetXMLData.processRow(GetXMLData.java:648)
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.trans.step.RunThread.run(RunThread.java:40)
    2013/10/16 16:26:19 - Get data from KML Fire Hotspot 2.0 - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : java.lang.Thread.run(Unknown Source)

  6. #6
    Join Date
    Aug 2013
    Posts
    16

    Default

    also, thank you so much for all this help. These past few iterations have been the most productive on this portion all day. You are great.

  7. #7
    Join Date
    Nov 2008
    Posts
    777

    Default

    Not sure what to tell you. It works perfectly for me on version 4.4 GA. Can you upgrade?

    What do you see when you preview step 1?
    Last edited by darrell.nelson; 10-16-2013 at 04:52 PM. Reason: lost internet connection on a power flicker
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  8. #8
    Join Date
    Aug 2013
    Posts
    16

    Default

    I also have an instance of 4.2GA on the VM. So unfortunately I cannot upgrade at this moment in time.

    when I preview step 1, everything looks good. I can see that it pulled the correct Regions and the rest of the xml is under the folder_xml

    here is one copy and pasted directly:


    Americas
    0
    0


    Alaska
    0
    MODIS Hotspots for Alaska. <br><br> For more information, visit FIRMS home page @ https://earthdata.nasa.gov/data/nrt-data/firms
    <br>
    1

    http://firms.modaps.eosdis.nasa.gov/...Alaska_24h.kml
    onInterval
    7200




    USA (Lower 48) and Hawaii
    0
    MODIS Hotspots for USA (Lower 48) and Hawaii. <br><br> For more information, visit FIRMS home page @ https://earthdata.nasa.gov/data/nrt-data/firms
    <br>
    1

    http://firms.modaps.eosdis.nasa.gov/...Hawaii_24h.kml
    onInterval
    7200



    Canada
    0
    MODIS Hotspots for Canada. <br><br> For more information, visit FIRMS home page @ https://earthdata.nasa.gov/data/nrt-data/firms
    <br>
    1

    http://firms.modaps.eosdis.nasa.gov/...Canada_24h.kml
    onInterval
    7200



    Central America
    0
    MODIS Hotspots for Central America . <br><br> For more information, visit FIRMS home page @ https://earthdata.nasa.gov/data/nrt-data/firms
    <br>
    1

    http://firms.modaps.eosdis.nasa.gov/...merica_24h.kml
    onInterval
    7200



    South America
    0
    MODIS Hotspots for South America. <br><br> For more information, visit FIRMS home page @ https://earthdata.nasa.gov/data/nrt-data/firms
    <br>
    1

    http://firms.modaps.eosdis.nasa.gov/...merica_24h.kml
    onInterval
    7200





    Could the problem be, like you stated that the XML isn't being successfully sent to step II through the fields?

  9. #9
    Join Date
    Nov 2008
    Posts
    777

    Default

    Yeah, pasting xml directly isn't much help. Can you "Go Advanced" and then enclose it in CODE tags? Right now I'm inclined to believe that you are seeing a bug that was repaired in 4.4 GA.

    Here's another version that does it in one pass. I didn't think it was going to work but it seems to.
    Attached Files Attached Files
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  10. #10
    Join Date
    Aug 2013
    Posts
    16

    Default

    thank you again for all your help.

    I was able to resolve the issues with the KML readings earlier however, while passing the newly downloaded kml files, I am running into an issue possibly that deals with encoding...

    I've been trying to read one of the KML files and using UTF-8 it seems that it doesn't like reading it up to "Date Acquired : 2013xB8-10-15 "

    The "xB8" is what is causing the errors. I have tried multiple different encodings however, I still am unable to successfully read the KML file.

    any thoughts?

    I have attached a sample KML files that has the same structure as the rest of them. I am currently using a "Get XML data step" and passing the fields but there are errors.

    KML file sample.zip

    " Invalid byte 1 of 1-byte UTF-8 sequence. Nested exception: Invalid byte 1 of 1-byte UTF-8 sequence."
    Last edited by ekcekc; 10-18-2013 at 03:24 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.