Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Get data from html page

  1. #1
    Join Date
    Oct 2013
    Posts
    2

    Question Get data from html page

    Hi, guys
    Can you help me with a doubt?
    I need a transformation in keetle but the input data is web-page (PHP). The page is a simple table (<table><tr><td>NODO</td><td>INTERFACE</td><td>IP</td><td>DESCRIP NODO</td><td>......</table>)
    The interesting about this is, I need run this transformation daily, so I looking a automatic form to get the table (was a schedule task without human intervention), transform the data and send report ...
    I've almost all but I don't know how get the HTML input.
    I used "HTML" step for get the web table like a file. but I need something like a html parser for use the data.

    If you know the form to solve it, I really appreciate your help. Sorry for my english but is not my language.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    There's a thread holding some pointers for you.

    I used Google to find it.
    Last edited by marabu; 12-06-2014 at 03:52 AM.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Oct 2013
    Posts
    2

    Default

    Hi marabu, thanks for you help, was a light on the way.
    I finished succesful the transformation. But I had other problem after read the page web: The problem were with nodes in the XML, I had had 9 nodes in the same level and i didn't know how get the 9 columns, example:
    <table> <tbody> <tr>
    <td>DATA_1</td>
    <td>DATA_2</td>
    <td>DATA_9</td>
    </tr>
    <tr>
    <td>Info1</td>
    <td>Info2</td>
    <td>Info9</td>
    </tr> </tbody> </table>
    In the step "Get data from XML" if I get the fields, only brings one node (<td>DATA_1</td>) and lost the other 2, My solution was when I read this and the solution was easy, I only need add the other rows (will be the columns) and the value of "Xpath" put the name the node + Number of Node:
    td[1]
    td[2]
    td[9]

    I can't attach my final transformation... I get 500 [IOErrorEvent type="ioError" bubbles=false cancelable=false eventPhase=2 text="Error #2038"] but when I can I'll attach it.
    PD. For his use is necessary you download the library jsoup.jar and put it in the library folder (...\data-integration\lib).

    Thanks buddy!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.