View Full Version : Download Multiple Files
Mick_data
06-04-2010, 06:04 AM
Hi.
I am using the SFTP step to download files from a server.
The issue is that SFTP downloads only 1 file at the time (no concurrent downloads).
Is there any setting that I need to change in order to download multiple files at the same time?
I even tried to setup a "Start" step that link to two SFTP steps, but instead of starting at the same time, only one step donwloads the file.
Note that using Filezilla to download the same files from the same server I can get multiple files at the same time.
Has anyone got a suggestion for a workaround or do I need to create a Jira?
Thanks.
Mick
MattCasters
06-04-2010, 06:46 AM
Click right on the "Start" job entry and try the parallel option.
Mick_data
06-04-2010, 07:26 AM
Hi Matt.
Tried your suggestion and it worked!
The only drawback is that I have to create a SFTP step for each file to download.
It would be great if a concurrent multiple files download is enabled by default in the SFTP (or FTP) step - with the option for teh user to define maximum number of downloads (if any).
Should I file a Jira to request these options?
Mick.
MattCasters
06-04-2010, 09:08 AM
It's already possible to use wildcards (regular expressions) to download multiple files in one go.
Since you can specify file names with variables you can also loop over a set of file names (with some tinkering). There are samples out there that show you how.
Finally, you can also directly access the files using Apache VFS in PDI 4. http://wiki.pentaho.com/display/COM/Configuring+Kettle+VFS
One would think that kinda covers the topic ;-)
Mick_data
06-04-2010, 09:28 AM
Hi Matt.
It's already possible to use wildcards (regular expressions) to download multiple files in one go.As far as I know it downloads multiple files but not "concurrently".
When I tested my SFTP steps, SFTP downloaded 1 file, then when that download was completed a second file was downloaded.
To explain it better:
if I use my wildcard .*/.txt and I have 10 text files, I would like to initiate 10 downloads at the same time.
At the moment, according to my testing, it downloads all 10 files, but one at the time.
Regarding: VFS.
I had a look at your link but could not understand how that could help.
But I'm not well versed in programming!
I still think that having the option to define number of concurrent downloads on the step interface would be useful.
Mick
MattCasters
06-04-2010, 11:06 AM
You can directly read files using the VFS driver:
For example, if you use the Excel Input step, you could specify a file like this:
sftp://username:password@server:/file/location/somefile.xls
Feel free to make a feature request in JIRA.
All the best,
Matt
Mick_data
06-04-2010, 11:51 AM
You can directly read files using the VFS driver:
Thanks: will give it a go.
Feel free to make a feature request in JIRA.
Will do!
Thanks.
Mick
Ralph
01-24-2011, 11:21 AM
Hi,
is it possible to download several sub folders, and if so, without knowing the names of the folders? Like, copy everything (Folders & Files) which is on \data?
Thanks & greets,
Ralph