Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: why can be kettle idle for a whole day?

  1. #1

    Default why can be kettle idle for a whole day?

    I tested kettle on a busy server, actually it was not so busy.
    Code:
                 total       used       free     shared    buffers     cached
    Mem:          1700       1161        538          0         87        573
    -/+ buffers/cache:        500       1199
    Swap:          895          0        895
    I extracted data from a mysql server, sort it and merge join with another table. and then insert them into another busy data source.

    But after a long day wait, it stopped at sort but not merge join, just keeps idle. no forward, but no errors as well. So how can it be so patient without any information to me that what it is doing.

    Hope you can tell me why kettle can be sleeping for so long.

    btw, it's still running.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    It's hanging ... usually the reason is a deadlock in the database parts (if you read and write from the same table in some databases e.g.) or we had 1 case before in PDI with a merge step going haywire (when you would merge 2 hops originally coming from the same step).

    Hard to tell more, without more information. As first step I would find yourself the nearest DBA (if you're doing database stuff).

    Regards,
    Sven

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    If you can, always sort and join on the database folks, it's always faster.

  4. #4
    Join Date
    Apr 2007
    Posts
    2,009

    Default

    not if your database is heavily loaded, or on duff legacy hardware...

    I used to work for a BI company who specialised in working for people who had crappy old database servers, and they got performance with an appliance style solution.

    There are many shops out there who still have really crappy old db servers! Of course, plenty more have shiny new boxes, and thats perfect of course!

  5. #5

    Default

    Its like you're a mind reader codek
    This is a signature.... everyone gets it.

    Join the Unofficial Pentaho IRC channel on freenode.
    Server: chat.freenode.net Channel: ##pentaho

    Please try and make an effort and search the wiki and forums before posting!
    Checkout the Saiku, the future of Open Source Interactive OLAP(http://analytical-labs.com)

    http://mattlittle.files.wordpress.co...-bananaman.jpg

  6. #6
    Join Date
    Apr 2007
    Posts
    2,009

    Default

    lol i just spend all day guessing what you're going to say and getting in first.

  7. #7

    Default

    Thanks for your helps very much.

    It was indeed the problem of the dead lock of the output data source.

    The data source, a mysql database as well, was servering for a lot of purpose. There were over 50 connections inserting data at the same time. After I stopped these connections, it was working, though still not fast as I expected.

    I wanted to share my case with you, so you might could help me resolve it. Actually it was quite a weird design of the database.

    We have a very long user list with users' information. The list is so long that we divided into hundreds of tables so that we could have many instances working at the same time to append the user list continually. Similarly as the uesr list, the user informations tables were also divided, therefore, we have more than 1000 tables in this database, we name it as DA.

    This list is needed to match another identity (like a student has a uniqe id in school, and he has also has a driving license id), which were saved in a remote database (DB) with many table as well, for now we have 10 tables saving them.

    So my task is to lookup rows of DA in DB to fetch a uniq id. I made 2 loops, running 1000*10 times.

    And finally I came to the most difficulty which was that DB most of times lack of DA's rows, and clients asked me to use a left join to move all DA data even without the DB's ID found; therefore, I need to use an insert/update to move the data into the target data source so that if any new id found then it can be updated the data, which was also divided into 1000 tables. Because the target source was so budy, and I et a dead lock, and raise this thread.

    It's running now, but this transformation made me heavily unconfortable. I thought I might have used python to write some scripts to insert the data.

    Would you please help me review the design and provide me suggestions on how to improve it?

    Thanks,

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.