Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Significant slow-down with certain hop configurations

  1. #1
    DEinspanjer Guest

    Default Significant slow-down with certain hop configurations

    I was testing a piece of a transformation where I had a filter to clean up some data and then further down the line, I had a step that ran in two copies.

    In my test case, I observed something that seemed a little odd. If the step that runs in two copies has two hops feeding it, it runs much faster than if it has only one step feeding it.

    I'm not sure if there is a legitimate explanation for the slow-down, and I figured that the test case might be interesting to see regardless, so I posted here with an attachment before creating a bug.

    What do you think?
    Attached Files Attached Files

  2. #2

    Default

    Could it be because 2 copies(step) = 2 threads thereby faster?

    I ran the transformation and noticed that the value of Speed (r/s) for D2 was significantly smaller [8889.9] than that of D0 [17786.2]. Have you noticed the same thing?

    In case it matters, here is my configuration:
    Linux 2.6.18-92.1.18.el5 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
    6 GB RAM.
    PDI 3.1 GA.
    Last edited by acbonnemaison; 11-26-2008 at 05:00 PM.

  3. #3
    DEinspanjer Guest

    Default

    Quote Originally Posted by acbonnemaison View Post
    Could it be because 2 copies(step) = 2 threads thereby faster?
    No, changing the order of the hops doesn't result in any new thread or anything. The most that it does, is it changes which step has two input buckets.

    Quote Originally Posted by acbonnemaison View Post
    I ran the transformation and noticed that the value of Speed (r/s) for D2 was significantly smaller [8889.9] than that of D0 [17786.2]. Have you noticed the same thing?
    That is sorta what I'm talking about. Note that D2 has two entries in the run table because it runs in two copies. But if you disable the "slow" hop and enable the "fast" hop (via right clicking on the arrow line) you should see that it performs much faster that way.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Interesting. What happens is that there are actually 4 hops at play if you enable the extra hop.

    D1.0-D2.0
    D1.0-D2.1
    D0.0-D2.0
    D0.0-D2.1

    Or: each copy of D2 receives from both steps. Perhaps it's simply because the buffer size doubles?
    Personally I think it's because there are now 2 threads delivering data to D2 in stead of 1.

    Matt

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    One thing you also have to note about these situations is that in case you have a slow step, followed by a fast one, the buffer sizes are almost always close to zero.
    The might cause the degree of parallelism as such to be lower in the slow situation vs the fast one.

    I noticed that if you lower the Row set size (Transformation settings) the difference is much lower.

  6. #6
    DEinspanjer Guest

    Default

    Interesting stuff. Hopefully someone else might find the thread useful down the road if they run into a similar situation.

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well, the thing is: this is not a real-life situation. The Dummy steps don't actually do anything beyond cause a bit of overhead. I would be very careful not to draw too many conclusions out of it.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.