Hitachi Vantara Pentaho Community Forums
Results 1 to 12 of 12

Thread: How to download via HTTP several files in one job?

  1. #1

    Question How to download via HTTP several files in one job?

    Hi!

    I run a job with a transformation that spits out rows and the field containing the URL of the to be downloaded file is named "file".

    The transformation sends those rows to an HTTP step that takes the URLs row-by-row from the "file"-field.

    Name:  Clipboard02.jpg
Views: 538
Size:  17.6 KB

    And that works fine - I can see the files being downloaded - but I just can't find a way to store all those downloaded files in some folder.

    In the end I am always left with the last file listed. !?

    Please have a look at my simplified example job. How can I I store both listed zip-files somewhere?

    Thanks in advance

    Raffael
    Attached Files Attached Files
    Last edited by joyofdata; 06-11-2013 at 05:02 PM.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by joyofdata View Post
    In the end I am always left with the last file listed.
    No wonder, since you always give the same target filename.
    While you can pick up the source URL from a result field, you need a variable for the target filename.
    You can avoid the hassle by letting the transformation store the files.
    Have a look at HTTP GET in disguise ...
    Attached Files Attached Files
    So long, and thanks for all the fish.

  3. #3

    Question

    Hi marabu

    Quote Originally Posted by marabu View Post
    No wonder, since you always give the same target filename.
    While you can pick up the source URL from a result field, you need a variable for the target filename.
    You can avoid the hassle by letting the transformation store the files.
    Have a look at HTTP GET in disguise ...
    This is pretty hacky. I mean downloading files is a very usual part of an ETL-process - so there should be a simple way to do it. And the HTTP job step seems to do something like it, given that you can run it for each row.

    So I actually I would really be interested in a way to do it using HTTP step for jobs. How can I use a variable in "Target file" that changes with every row. This seems like something worth knowing in general - but I have no clue how this can be done.

    In general the settings in the "Webserver reply" section remain a secret to me ... "Add filename to result filesname" - this sounds like I can change the target filename depending on "something".

    Please tell me more

    Would it be possible to row-wise copy the target file to another file depending on a field?

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    To each his own - here is a demo showing how it's done using a HTTP job entry.
    Could be there's a demo already in the samples, but I didn't check.

    Quote Originally Posted by joyofdata View Post
    This is pretty hacky.
    Well, I'm no hacker. It's just VFS on top of HTTP. To me it looks slightly less complicated, but beauty is in the eye of the beholder ...

    Quote Originally Posted by joyofdata View Post
    In general the settings in the "Webserver reply" section remain a secret to me ...
    The documentation is quite explicit about most of the settings, I believe.

    Quote Originally Posted by joyofdata View Post
    "Add filename to result filesname" - this sounds like I can change the target filename depending on "something".
    What it actually does is adding a filename to a list of filenames
    This special list allows to remember filenames from several steps resp. entries for post-processing.
    See step "Get files from result".

    Quote Originally Posted by joyofdata View Post
    Would it be possible to row-wise copy the target file to another file depending on a field?
    Everything is possible, even when I don't understand it.

    Check the attached demo and ask again, if you dare ...
    Attached Files Attached Files
    So long, and thanks for all the fish.

  5. #5

    Default

    your solution provided HTTP.zip is fine, thanks.

    But where does DL come from?

    Code:
    var url = service + "?file=" + param;
    var target_file = getVariable("DL", "") + "/" + param;
    In Settings it is set to ?DL? as its default value. But what is '?DL?' ?



    I am pretty sure you know what I mean with hacky ... it's not an insult btw. Given that Kettle provides you all sorts of Joins that you don't have to a write a script for, it surprises me that none of the steps labelled "HTTP" offers a convenient way to download more than one file.
    Would it be possible to row-wise copy the target file to another file depending on a field?
    Kettle processes the steps row-wise. So would it be possible to have a file provided in row #N downloaded to 'filename' and then moved to 'filename_N' and have that happen for all coming rows?

    Thanks in advance

    Raffael
    Last edited by joyofdata; 06-12-2013 at 03:02 PM.

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    ?DL? is just a placeholder showing up when you forget to provide a value for this parameter.
    In Spoon you would provide this value at the outmost job, with kitchen you would use a command line parameter.

    Quote Originally Posted by joyofdata View Post
    Kettle processes the steps row-wise. So would it be possible to have a file provided in row #N downloaded to 'filename' and then moved to 'filename_N' and have that happen for all coming rows?
    Of course it is, you only must add a sequence and adjust the JavaScript code.
    So long, and thanks for all the fish.

  7. #7

    Question

    That means the question marks serve no purpose? (Similar to ${...} f.x.?)

    Of course it is, you only must add a sequence and adjust the JavaScript code.
    I understand what you mean. But this is not what I meant. Not that important anyway as both of your solutions do the trick.

    Thanks!

  8. #8
    Join Date
    Jun 2013
    Posts
    3

    Default

    Hello:

    Thanks for asking this question. I am a bit baffled with a similar situation.

    I am in a slightly different situation where the server responding to my http calls returns files with file names that are different than the parameters that were sent. All I want to do is save those files as they are sent by the server. Is there a way to store the names of the files that are being downloaded in a variable like marabu was using using the parameter variable?

    I thought this was going to be a lot easier than what I have experienced so far! Thanks.

  9. #9
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Are those files public resources i.e. can I access the server?
    If not, is there a public service resembling your scenario?
    At least, be a bit more specific regarding the way you request the files from the server (URL query parameters and meaning).
    And don't forget to describe what you get from the server.
    Ideally you attach a HTTP log showing the request and response headers.

    If the name of the resource isn't reflected by a query parameter, you may have a chance to learn a suggested filename from the HTTP headers.
    So long, and thanks for all the fish.

  10. #10
    Join Date
    Jun 2013
    Posts
    3

    Default

    Yes. I have the querystring below. I think we concatenate the parameters that are being sent to the server and make our own filename. That will work fine. But, I just thought it would be simpler just to somehow save the files as they are downloaded.

    http://oasis.caiso.com/mrtu-oasis/Gr...tdate=20130303

    Variables groupid and startdate are the two values we can send to the server. And, the server responds with a file named 20130303_20130303_PUB_RTM_GRP_N_N_xml.zip.

    Thanks!

  11. #11
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    The filename is advertised via a Content-Disposition HTTP header.
    While your browser does analyze that header, Kettle does not.
    Since Kettle doesn't give you access to the response headers either, it's easiest to derive the target filename from the parameters, as you concluded yourself.
    If you couldn't live with that, you would have to rely on heavy scripting, patch the step source code or propose a feature request.
    So long, and thanks for all the fish.

  12. #12
    Join Date
    Jun 2013
    Posts
    3

    Default

    Thank you for your response. Your previous post was very helpful to build file names based on parameters.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.