Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Patially blocking step with UDJC

  1. #1
    Join Date
    Oct 2014
    Posts
    8

    Question Patially blocking step with UDJC

    Hi

    I try to write partially blocking step with UDJC
    I have incoming rows with one column (Request) and want mimic the batch processing.
    I would like to concat the column of x rows and dispatch the result as completely new row (BulkRequest) like this:


    private String bulkrequest = "";
    private int i = 0;
    private batchSize = 1000;

    public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
    {
    Object[] r = getRow();

    if (r == null || i == batchSize) {
    Object[] s = RowDataUtil.allocateRowData(data.outputRowMeta.size());
    get(Fields.Out, "BulkRequest").setValue(s, bulkrequest);
    putRow(data.outputRowMeta, s);
    bulkrequest = "";
    i=0;
    //s = null;
    if (r == null) {
    setOutputDone();
    return false;
    }
    }
    i++;
    bulkrequest = bulkrequest+get(Fields.In, "Request").getString(r);
    return true;
    }

    Unfortunately the running trasformation slows down and I get java.lang.OutOfMemoryError exception

    Thanks

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    No need for code - here's a flow-based solution.
    Attached Files Attached Files
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Oct 2014
    Posts
    8

    Default

    Thanks Marabu!!
    I just wanted to avoid too many steps.
    Anyway is it some memory leak?

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    One of your top design goals with Kettle should be to avoid imperative (Java or JavaScript) code.
    If you think your design could be simplified, then, feel free to discuss it here in the forum.

    Sorry, but no comment on your code from me.

  5. #5
    Join Date
    Oct 2014
    Posts
    8

    Default

    Hi Marabu,

    Unfortunately your solution also consumes all the java heap. The rows come from table input step and I tried to throttle its throughput with lowering the defaultRowPrefetch, but anyway...
    I must to refactor all the transformation and somehow make the table input step runs in batch with iteration.

    Thanks anyway.

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    The flow-based approach I suggested can't heal the damage done elsewhere.
    Obviously, it wasn't your UDJC-code that brought your machine down.

  7. #7
    Join Date
    Aug 2016
    Posts
    290

    Default

    If you're appending a lot of strings, do use StringBuilder instead for much improved performance.

  8. #8
    Join Date
    Oct 2014
    Posts
    8

    Default

    I tried the StringBuilder like this. I run out of memory even sooner. May be it is wrong


    import java.lang.StringBuilder;


    private int i;
    private StringBuilder bulkrequest;


    public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface)
    {

    bulkrequest = new StringBuilder();
    return parent.initImpl(stepMetaInterface, stepDataInterface);
    }


    public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
    {

    Object[] r = getRow();

    if (r == null || i == 1000) {
    Object[] s = RowDataUtil.allocateRowData(data.outputRowMeta.size());


    get(Fields.Out, "bulkrequest").setValue(s, bulkrequest.toString());

    putRow(data.outputRowMeta, s);
    i=0;
    bulkrequest = new StringBuilder();
    if (r == null) {
    setOutputDone();
    return false;
    }
    }


    i++;

    bulkrequest.append(get(Fields.In, "Request").getString(r));
    return true;
    }

  9. #9
    Join Date
    Aug 2016
    Posts
    290

    Default

    Looks like you did it exactly right. No idea why that is too heavy.

    I normally initialize objects inside "if(first)" clause, interesting to see the init method. Maybe there is a .clear() method instead of instantiating a new StringBuilder object, but it shouldn't make too much of a difference.
    Last edited by Sparkles; 10-13-2017 at 05:22 AM.

  10. #10
    Join Date
    Oct 2014
    Posts
    8

    Default

    Hi Sparkles

    I tested the performance. The StringBuilder solution is extremly fast compared to the orig. Thanks.
    And with setting the defaultRowPrefetch in a Table Input step and the batchsize in the UDJC I might have found the balance between the perf and the
    heap utilization!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.