Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Input an html file

  1. #1

    Default Input an html file

    Hello everybody,
    i know this is the most noob question i can do, but i don't really know if i'm loosing something in kettle. I have to get some html files from http (as i read kettle doesn't retrieve directly from http, so i will implements something in java to do it), and after i should input the html in kettle. But how to input the html file (the entire lines i mean) cause i wanted to filter it with a java class (using jtidy)? I'm doing a work for a teacher and he said to use java only where i can't use a kettle's function

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I recall this thread providing some links and insight.

    Quote Originally Posted by neo87 View Post
    as i read kettle doesn't retrieve directly from http
    That's not correct. Kettle uses Commons VFS, so basically you can use a URL as a filename.
    Last edited by marabu; 03-21-2014 at 07:48 AM. Reason: damned typo
    So long, and thanks for all the fish.

  3. #3

    Default

    I know, but the search string is variable with a session ID... if i save the html, how can i import all the lines in a transformation? This is my dilemma
    Last edited by neo87; 03-21-2014 at 07:35 AM.

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    There's an input step named "Load file content in memory".
    So long, and thanks for all the fish.

  5. #5

    Default

    This one worked perfectly... Now i can't mind why if i follow this one http://rpbouman.blogspot.de/2011/05/...ages-with.html to use jtidy i get a null pointer exception :-/
    I think that the field (String content) i set in the "Load file content in memory" is not recognized as input stream in the "user defined java class" when it runs "Object r[]=getRow();"... What should i do about? I'm going crazy

    PS null pointer exception should be here " StringReader html = new StringReader((String)r[4]);"

    Quote Originally Posted by marabu View Post
    There's an input step named "Load file content in memory".
    Last edited by neo87; 04-08-2014 at 05:30 AM.

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Make sure that enough input fields are provided to your scripting step.
    Use action "Show input fields" from the step menu to check this.
    So long, and thanks for all the fish.

  7. #7

    Default

    Marabu, thx for your greatly support, finally i got it working... I post the kettle transformation in case someone need an example on how to do it


    html-to-xml.ktr

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.