Hitachi Vantara Pentaho Community Forums
Results 1 to 12 of 12

Thread: About performance: Multiple copies of outputs, commit size, etc...

  1. #1
    Join Date
    Feb 2011
    Posts
    840

    Default About performance: Multiple copies of outputs, commit size, etc...

    so, basically, is there a smart way to check what's the optimal configuration, instead of trial and error attempts messing with all those configurations?

    Like, now I'm getting a ridiculously low mark of 70 reg/sec using 3 copies of a Table output step, to an SQL server, with 700 of commit size.
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Do a ping to the SQL Server machine, divide 1000 by the ping value. That would be the maximal throughput if you're on a high latency network.
    That's the sort of thing you can do to see what's going on.
    Or disable the Table Output step to see if it's not another step in your transformation that's slow.

  3. #3
    Join Date
    Feb 2011
    Posts
    840

    Default

    hm... maybe this could help with suggestions...

    this is the transformation:

    this is the step metrics when it begins:

    and this is when it ends:


    that "Set Variables" is floating there because I'm running this transformation standalone as a test. it's part of a job, where I need the rowcount to be set as a variable...

    Tried disabling outputs and got this, when starting...


    and it finished with 15 minutes. 15 minutes. I feel like I'm doing something very wrong at the output...
    Last edited by joao.ciocca; 08-21-2012 at 05:39 PM.
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  4. #4
    Join Date
    Feb 2011
    Posts
    840

    Default

    Quote Originally Posted by MattCasters View Post
    Do a ping to the SQL Server machine, divide 1000 by the ping value. That would be the maximal throughput if you're on a high latency network.
    hm... kay, I'm getting 2ms ping, what should I do with it? dividing 1000 by 2 gives me 500... so I can use 1 copy of the output, with 500 commit? sorry if this is way too basic, all my knowledge on these kinds of things have come on a need-to-know basis... like, if I need to do something, I go around the web, search for it =p and so now I've been trying for the past half hour to find information about "commit size" and similar terms... but I haven't found any explanation good enough.

    Would someone mind to give me a hand with this? I just find it weird that a transformation starts with more than 2k r/s and ends with a couple dozens...
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    You are copying rows from Select Values to NCR.
    Input on Select Values is 1035207 and output is 3105621 - Exactly 3 times the data. This means that for each of your NCR outputs, you are loading all the data. You would get better data throughput by using only one copy at this point. However, what I think you wanted to do is distribute the data between the 3 NCR outputs.

    This in turn is slowing down the data throughput all the way back!
    Last edited by gutlez; 08-24-2012 at 06:31 PM.

  6. #6
    Join Date
    Feb 2011
    Posts
    840

    Default

    Quote Originally Posted by gutlez View Post
    However, what I think you wanted to do is distribute the data between the 3 NCR outputs.
    wow! I had disabled that "Copy or distribute" message, and never figured that when I created some hops, they would copy all the data to all the step copies!

    I re-enabled the message, made sure it was set to distribute... btw, Matt you around? This could be asked on JIRA, as an improvement? A context-menu item where you can select if given hop is set do copy or distribute?

    [edit]and look at that! now it's stable. 5x the output, all with 320-325 r/s =D
    Last edited by joao.ciocca; 08-24-2012 at 06:44 PM.
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  7. #7
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Context Menu:
    Right Click on the step (outgoing - Select Values step in your case) and select Data Movement -> Distribute data to next steps

    Note: On the Hop itself, it shows a pair of documents to show that you have Copy turned on. Distribute is the default.

  8. #8
    Join Date
    Feb 2011
    Posts
    840

    Default

    hmmm I get it now! thanks gut! =) it never showed to me because everytime I needed that, like from 1 step to 2 different ones, I needed to copy all, instead of distribute... and since I answered the popup with "Copy" and checked the "never ask again", it'd default to Copy afterwards.

    oh! and I asked about the context menu 'cause I thought it'd show up on the hop's context menu, instead of the origin step =)
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  9. #9
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    It doesn't make any sense since once the row is "on the hop" it's already copied, distributed or load balanced.
    But I'll keep in mind that adding informational dialogs and messages is useless if the developer chooses to ignore them.
    Even the copy icon on the hop with the pop-up hover information unfortunately didn't help you.
    Unfortunately, I'm beginning to think that no JIRA case or improvement could ever hope to fix that :-)
    Last edited by MattCasters; 08-24-2012 at 10:15 PM.

  10. #10
    Join Date
    Feb 2011
    Posts
    840

    Default

    makes me wanna jump from the window when I don't understand if you're being serious or sarcastic, Matt =p
    as I mentioned, the first few times that the popup window asking to select "Copy" or "Distribute", it was a case of copy - and so, after clicking 5 times on "Copy" in less than 15 minutes, I just disabled it - which made Kettle switch the default behaviour to "Copy" instead of distribute... and I didn't knew that leaving it as "Copy" while using multiple copies of an output would copy ALL rows to ALL copies of said output...

    and... no, I never thought of looking for the hover popup on the copy icon. My bad. I'll keep that in mind from now on! Promise.

    but... actually, now that I've looked again at the screenshots I've posted above, I failed to mention that the "copy" icon only appeared on the hop to the output step because previously I had linked the previous "Select Values" step to more than one step, not only to the output that's pictured in the screenshot... before linking it to other steps, there was no "Copy" icon. And after removing the other steps, after gutlez said to me to use "Distribute" instead of "Copy", didn't matter if I deleted that hop and recreated it - it still kept showing up as a "Copy" hop.

    I had to go to Tools -> Options and reenable the "show copy or distribute dialog" to be able to switch that hop back to distribute...
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I'm not joking actually, we're constantly looking for ways to explain these core behaviour items better to the user.
    Unfortunately, besides tooltips, pop-up menus, dialogs... I don't know what else we can do.

    ;-)

    Matt

  12. #12
    Join Date
    Feb 2011
    Posts
    840

    Default

    well.. you could just make a popup that slaps the idiot in the face and says: "HEY, YOU HAVE TO READ THIS, IDIOT." =p
    something like: "if you understood that was written here, click no" - and if the stupid clicks yes, the popup will just change to a different message and show up again =p
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.