Hitachi Vantara Pentaho Community Forums
Results 1 to 24 of 24

Thread: Editor slowness with large transformation

  1. #1

    Default Editor slowness with large transformation

    With small transformations, the editor is fairly quick to respond. But I have a large transformation that is loading a fact table and does lookups on each of the dimensions, there are roughly 100+ transformation steps, organized in a fairly linear manner as I process a fact row from the source system and do the usual mapping from keys to dimension ids. I am also doing checks for invalid references and logging these as I go.

    What I have observed is that as I move further down the linear chain of transformations, the editing gets slower and slower. Just opening a node to edit it can result in a wait of around a minute to open the node up.

    I am guessing that spoon is doing some analysis of the entire chain of transformation steps that precede the one that I am editing. Is that correct? Is there any way around this. It becomes so slow it is nearly unusable.

    I am using an Oracle database as the repository. Is spoon doing a complete traversal of the graph everytime I edit? I have tried editing with the database cache both on and off, but it makes no difference.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Well... the more steps you use the slower it will become, some "analysis" does happen. Short term the only way out would be to split up your transformations. Most of my transformations have 5 to 15 steps.

    Regards,
    Sven

  3. #3

    Default slowness

    I am not sure how to break this large transformation up...
    Is it doing database processing as I edit things in the editor? Would it be faster to use file-based storage?

    Subtransformations won't work because I am using the stream lookup operator to find the dimension id values. I was told on this forum before that with each call of a subtransformation it will reread the database in the stream lookup.

    I don't consider 10 dimensions for a fact table to be that large.... It would be nice if there was a way to turn off the analysis while editing...

  4. #4

    Default CPU not busy

    Still having the slowness problems. Its odd, the machine has 2GB, the CPU is mostly idle (max of 14% utilization), there are 2 CPUs but I suspect spoon editing is single-threaded. Plenty of memory allocated to the JVM.

    When I just try and edit the name of a step, I get the hourglass and have to wait, yet the CPU is not really doing anything.

    Very frustrating...

  5. #5
    Join Date
    May 2006
    Posts
    4,882

    Default

    Try switching off your virus scanner. If that would make it faster you can probably except the jar files of kettle for viruses.

    Regards,
    Sven

  6. #6

    Default slowness and Cancel button

    So my transformation is 70 steps. I have tried to do a verify of the transformation. After it has run for 2 hours with the progress meter at only about 10%, I hit it cancel button. But it took 15 minutes for things to return from hitting the cancel button.

    What all is going on during verification and when opening up a step that is fairly far downstream in the transformation. Steps at the beginning are very fast, but once you get much futher down the transformation pipeline, it takes forever. CPU use never exceeds 15%. Is there a LOT of database processing being performed???

  7. #7
    Join Date
    May 2006
    Posts
    4,882

    Default

    I would expect 70 steps to be slower than 10 or so, but not 2 hours. Attach your transformation or send it.

    Regards,
    Sven

  8. #8

    Default more problems at runtime

    I tried running the transformation. It ran for awhile, but then just stopped processing rows.

    Also, I thought I would try taking out the logging of invalid references. I thought I could copy/paste the transformation and make edits to remove the logging steps. But when I try to paste the transformation from the clipboard (after copying it to the clipboard), nothing happens, it does not make the copy into the new empty transformation. Is copy/paste supposed to work?
    Last edited by DavidJordan; 07-19-2007 at 11:27 PM.

  9. #9
    Join Date
    May 2006
    Posts
    4,882

    Default

    20 seconds to verify.
    Sven

  10. #10

    Default interesting...

    My machine has 2 CPUs, running at 2.13 Ghz, 2GB of memory, running Windows XP.
    I am running Java 1.6, yet many of the Pentaho tools seem spec'd for Java 1.4. Could that be the issue?

    By the way, I tried taking out all of the logging of invalid references, it was much faster in the editor and was also able to do the actual load in reasonable time. But I'd prefer to do the logging of invalid references...

    Are there any tunable parameters you may have set that would yield the better performance?

  11. #11
    Join Date
    May 2006
    Posts
    4,882

    Default

    I run JDK1.5 at the moment. Kettle 2.5 is compiled against JDK1.4 but should be able to run against JDK5... possibly also JDK6. However there's one specific version of JDK6 which seems to be causing some problems (but you would be very unlucky to have just that version).

    Maybe someone else can try as well.

    Regards,
    Sven

  12. #12
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Slow to verify: caused by a whole series of Oracle connection timeouts on my box since I don't have access to your databases.
    That in itself can be the cause on your end too OR it could be that one of the queries executes very slowly.

  13. #13

    Default database issue

    This morning we tried to do the verify on 2 other computers and tried JDK 1.4, 1.5, and 1.6. They all exhibited the slowness. But we discovered that if we saved the transformation in XML, brought up Spoon without a connection to the repository, then imported the transformation from XML file, and do the verify, the verification only took 2 seconds.

    So there is definitely an issue related to the database interaction which is causing the slowness. Our Oracle DBA says there does not appear to be any issues from the Oracle side.

    Are you using log4j for logging? Can I increase the logging level to get more info? Where can this be set? I'd like to resolve the database issue, but need more info.

    I do see the following in server.log for yesterday:
    2007-07-17 15:42:13,781 INFO [DWConnection] DWConnection - Statement canceled!

    But that may have been when running the transformation, not verifying.

    I did not see anything in the server.log for today, I did some verifications today (though I killed them because they seemed to be taking too long). But there were no log records for today. Does logging take place during verification?

    I guess I am surprised that when doing a verification, or simple edits of a transformation in memory, that it is necessary to do a bunch of database level accesses. Isn't all the information represented as Java objects in memory?

    Why doesn't Spoon just open a single connection to the database? why would you see a whole series of connection timeouts?
    Last edited by DavidJordan; 07-18-2007 at 11:07 AM.

  14. #14
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    We actually cache all the metadata that are the results of database queries/lookups.
    However, it has to perform them at least once successfully. (or perhaps you turned caching off in the options?)

    There might be an issue with the repository, but I kinda doubt it.

    To turn logging level way up, you can open any logging tab (click right on the transformation name in the upper left tree and select "Open logging view". There is a "Log Settings" button at the bottom.

    HTH,

    Matt

  15. #15

    Default update

    In options, the use database cache was checked. I tried it both ways, checked and unchecked, did not seem to make a difference.


    I set the logging level to debug (is that the right level?).

    I then ran the verification.

    Is it normal to see "New database connection defined" for the same connection, repeatedly during the verification?

    I also see "Loading step with ID: xxx" for the same steps, over and over again. Does that mean it is loading it from the database?

    What should I look for in the log file? I don't see any database-specific exceptions being logged.

  16. #16

    Default more info

    We found in the Oracle Listener log that a request was being made to connect to the repository EVERY second while Spoon was running the verification. Connections were being accepted and given out. Is there a reason for this?

  17. #17
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I think it's a problem in the mapping step. That step loads the used transformation for verification. I suspect there is no caching on that causing the problem for you.

  18. #18

    Default ???

    When you say "mapping step", what are you referring to? I have the "Use database cache" in options turned on. Is there something else that must also be set?

    I had not set "Use a connection pool" for the connections. Should that be turned on???
    Last edited by DavidJordan; 07-18-2007 at 02:25 PM.

  19. #19

    Default Matt's response

    Now that I am reading Matt's response again, are you saying there is a bug in your implementation with how you map data between memory and the database? (Your use of the word "step" made me think earlier that you were referring to one of my transformation steps in my transformation.) I really need to get this resolved ASAP, or an important deadline will be missed. Is this something that can be fixed in a day or so. Let me know either way. If it can't, it means dropping the use of a database repository and migrating everything to files.

    I am surprised this has not been seen before. Do most people just store transformations in files?
    Last edited by DavidJordan; 07-18-2007 at 08:46 PM.

  20. #20
    Join Date
    May 2006
    Posts
    4,882

    Default

    Now that I am reading Matt's response again, are you saying there is a bug in your implementation with how you map data between memory and the database? (Your use of the word "step" made me think earlier that you were referring to one of my transformation steps in my transformation.)
    That for mapping step it spoon needs to get back to the database to get its subtransformation.

    If it can't, it means dropping the use of a database repository and migrating everything to files.

    I am surprised this has not been seen before. Do most people just store transformations in files?
    Move to files for production and save yourself a couple of gray hairs.

    Regards,
    Sven

  21. #21

    Default gray hairs

    Mine aren't turning gray, they just die and fall off. I'd give anything to have a thick frock of gray hair...

    So its the subtransformations causing the problem? That is good to know.

  22. #22
    Join Date
    May 2006
    Posts
    4,882

    Default

    lol... may be one of the reasons. Personally I would use files for anything productive, but I may be a bit conservative in those things.

    E.g. do you now take backups of the schema in which you put the repository in?

    Regards,
    Sven

  23. #23

    Default still slow with file-based transformations

    I exported my transformation and its subtransformations are now stored in the file system. I have restarted spoon without connecting to the repository and opened the transformation. I am now running verification. So far, it has run for 10 minutes. The progress window shows the step is currently verifying. The first few steps it verified were Add constants steps that just add 3 constants. These took 1 minute to verify, which still seems very wrong.

    Yesterday I had reported that a verify ran very fast when it was file based, but I later discovered that it had just terminated early because it was terminating early because the subtransformations were still in the database repository.

    So despite moving everything out of the database repository, verification is still running very slow. What I need to assess now is whether editing the transformation is running better now (it was taking 5 minutes to respond to simple mouse click operations).

    If you want me to send you my files so you can investigate this, let me know where to email it.

  24. #24

    Default exception and too many database connection attempts!

    Below is just 2 seconds of the log output during verification. Way too many attempts to make database connections!
    Also, verification died earlier due to NullPointerException, details are below.

    Some example log output:

    2007-07-19 11:54:52,406 INFO [DWConnection] DWConnection - New database connection defined
    2007-07-19 11:54:52,484 INFO [DWConnection] DWConnection - Connected to database.
    2007-07-19 11:54:52,484 INFO [DWConnection] DWConnection - Connection to database closed!
    2007-07-19 11:54:52,484 INFO [TPF Connection] TPF Connection - New database connection defined
    2007-07-19 11:54:52,671 INFO [TPF Connection] TPF Connection - Connected to database.
    2007-07-19 11:54:52,734 INFO [TPF Connection] TPF Connection - Connection to database closed!
    2007-07-19 11:54:52,734 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:52,734 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/get_nonnullmorphologycode.ktr]
    2007-07-19 11:54:52,750 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:52,750 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/get_nonnulltopography.ktr]
    2007-07-19 11:54:52,750 INFO [DWConnection] DWConnection - New database connection defined
    2007-07-19 11:54:52,828 INFO [DWConnection] DWConnection - Connected to database.
    2007-07-19 11:54:52,828 INFO [DWConnection] DWConnection - Connection to database closed!
    2007-07-19 11:54:52,828 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:52,843 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/log_invalidstringreference.ktr]
    2007-07-19 11:54:52,843 INFO [DWConnection] DWConnection - New database connection defined
    2007-07-19 11:54:52,921 INFO [DWConnection] DWConnection - Connected to database.
    2007-07-19 11:54:52,921 INFO [DWConnection] DWConnection - Connection to database closed!
    2007-07-19 11:54:52,937 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:52,937 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/log_invalidstringreference.ktr]
    2007-07-19 11:54:52,937 INFO [DWConnection] DWConnection - New database connection defined
    2007-07-19 11:54:53,031 INFO [DWConnection] DWConnection - Connected to database.
    2007-07-19 11:54:53,031 INFO [DWConnection] DWConnection - Connection to database closed!
    2007-07-19 11:54:53,031 INFO [TPF Connection] TPF Connection - New database connection defined
    2007-07-19 11:54:53,234 INFO [TPF Connection] TPF Connection - Connected to database.
    2007-07-19 11:54:53,281 INFO [TPF Connection] TPF Connection - Connection to database closed!
    2007-07-19 11:54:53,296 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:53,296 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/get_nonnullmorphologycode.ktr]
    2007-07-19 11:54:53,312 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:53,312 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/get_nonnulltopography.ktr]
    2007-07-19 11:54:53,312 INFO [DWConnection] DWConnection - New database connection defined
    2007-07-19 11:54:53,390 INFO [DWConnection] DWConnection - Connected to database.
    2007-07-19 11:54:53,390 INFO [DWConnection] DWConnection - Connection to database closed!
    2007-07-19 11:54:53,406 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:53,406 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/log_invalidstringreference.ktr]
    2007-07-19 11:54:53,406 INFO [TPF Connection] TPF Connection - New database connection defined
    2007-07-19 11:54:53,625 INFO [TPF Connection] TPF Connection - Connected to database.
    2007-07-19 11:54:53,671 INFO [TPF Connection] TPF Connection - Connection to database closed!
    2007-07-19 11:54:53,671 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:53,671 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/get_nonnullmorphologycode.ktr]
    2007-07-19 11:54:53,687 INFO [SharedObjects] SharedObjects - Reading the shared objects file [file:///C:/Documents and Settings/djordan/.kettle/shared.xml]
    2007-07-19 11:54:53,687 INFO [Loading Mapping from repository] Loading Mapping from repository - Mapping transformation was loaded from XML file [Y:/DW/Prototype/Transformations/get_nonnulltopography.ktr]
    2007-07-19 11:54:53,687 INFO [DWConnection] DWConnection - New database connection defined
    2007-07-19 11:54:53,765 INFO [DWConnection] DWConnection - Connected to database.
    2007-07-19 11:54:53,765 INFO [DWConnection] DWConnection - Connection to database closed!
    2007-07-19 11:54:53,765 INFO [TPF Connection] TPF Connection - New database connection defined
    2007-07-19 11:54:53,968 INFO [TPF Connection] TPF Connection - Connected to database.


    Exception data in the log:


    2007-07-19 11:57:40,000 ERROR [Load TissueSpecimens] Load TissueSpecimens - java.lang.NullPointerException
    at be.ibridge.kettle.trans.step.mapping.MappingMeta.check(MappingMeta.java:431)
    at be.ibridge.kettle.trans.step.StepMeta.check(StepMeta.java:685)
    at be.ibridge.kettle.trans.TransMeta.checkSteps(TransMeta.java:4244)
    at be.ibridge.kettle.spoon.dialog.CheckTransProgressDialog$1.run(CheckTransProgressDialog.java:72)
    at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:113)

    2007-07-19 11:57:40,015 ERROR [i18n] i18n - be.ibridge.kettle.core.exception.KettleException:
    Message not found in the preferred and failover locale: key=[AnalyseImpactProgressDialog.RuntimeError.ErrorCheckingTransformation.Exception], package=be.ibridge.kettle.spoon.dialog

    at be.ibridge.kettle.i18n.GlobalMessages.calculateString(GlobalMessages.java:311)
    at be.ibridge.kettle.i18n.GlobalMessages.getString(GlobalMessages.java:325)
    at be.ibridge.kettle.spoon.dialog.Messages.getString(Messages.java:19)
    at be.ibridge.kettle.spoon.dialog.CheckTransProgressDialog$1.run(CheckTransProgressDialog.java:76)
    at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:113)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.