Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Processing batches of rows

  1. #1
    Join Date
    Feb 2017
    Posts
    2

    Question Processing batches of rows

    Hi,
    I'm extracting 10.000+ xml files, each resulting in a row, processing it and sending each row as a webservice request to a backend system.
    The problem I have is that the backend system is overfloaded and cannot handle the 10.000+ webservices requests at once. It can handle 100 requests well

    What I want to achieve is following:
    1. read all XML files
    2. process batch of 100 rows
    3. wait until these 100 rows are processes by the backend system
    4. then process a new batch of 100 rows
    5. and so forth until all batches are completed

    I cannot figure how to do it. The row by row in a Job is too slow. The single threader does not support POST HTTP step.
    So I was thinking of using "block until.." somehow but until now I was not able to get it working.

    Any suggestion how I could do it?

    Thanks,
    Bert

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Have you considered utility step Delay-Row?
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Feb 2017
    Posts
    2

    Default

    Yes, but it not reliable or I should set the delay very high eq 1 second. The problem here is the backendsystem, it sometimes performs well, sometimes not. I think the most best solution would be the batch processing.

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I had in mind that you prepare (aggregate) the batch so you have a batch per Kettle row.
    Then you either find a way to learn if the webservice is ready to continue or you use Delay-Row.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Aug 2011
    Posts
    360

    Default

    You can use the job/trans executor where you specify number of rows in a batch.
    Like prepare a stream with all your xml filenames, send it to job executor by batch of 100 rows.
    In the sub job, put a trans with get rows from results to get the 100 filenames back then read all xml files then http request.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.