Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Update using sequential scan rather than index

  1. #1
    Join Date
    Sep 2007
    Posts
    2

    Default Update using sequential scan rather than index

    We have a transformation populating a table that recently started chugging on the updates. When the table was at 2.3 million rows, everything was pretty snappy. We started noticing a slow down and when looking into it, the table was at 2.7 million rows with updates taking better than 5 seconds per row.

    We kicked up the logging level on the DB and found that a sequential scan was being performed on the updates rather than using the primary key index. We have another table that sits around 2.3 million rows right now and it is using the indexes for updates. When we run the same update query through pgadmin, it's done in a few milliseconds.

    We've rebuilt the pk index on the table and checked every setting we can think of to impact this from the DB side to no avail. We also tried dropping all other indexes on the table and saw no improvement. This is only happening where the update where clause is against the primary key. It uses other indexes just fine. Is there a chance kettle is forcing the sequential scan or a kettle config that would impact this? Does anybody have any idea of other culprits or config settings come to mind?

    Thanks,
    Raquel

  2. #2
    Join Date
    Oct 2007
    Posts
    10

    Talking

    Are you using postgres by chance? I ran into a similar issue where the queries being ran against the db were using

    (column=value or column is null)


    This happens on the combination lookup step for me Postgres version < 8.3 doesnt index null values unless you create an explicit index to do so. There are several workarounds to this problem. Kettle still checks for nulls even if the column is not null and primary key. This confuses me.

    There are a few solutions to this problem.

    create a null index.
    create index idxwhatever on table(column) where column is null;

    use the hashcode, super fast lookups but it didnt work for me. to many values got duplicated on the same hashcode. enough for me to look at another solution.

    dont use the combination lookup step, use the other one dimension lookup/update, i like this solution because kettle is not doing or with null.

    upgrade to postgres 8.3 beta 4, the standard index's are including nulls now.

    If you arent using postgres, and this is not the problem you were having then check to see why your database isnt hitting the index. This isnt really kettles fault. Post your dbms specs and the query thats missing the index.

  3. #3
    Join Date
    Sep 2007
    Posts
    2

    Default

    Yes, we are using postgres. However, we're having problems when the only where clause condition is the primary key match, which would eliminate unindexed nulls as the culprit.

    We think we've got the problem figured out though. Postgres forces a sequential scan on mismatched data types. Our primary key is a type bigint. It gets set as type "big number" when we pull it into Kettle, thus resulting in a type mismatch when we try to perform updates.

    To test this, we made the primary key an integer and pulled it into Kettle as an integer. Updates performed then used the primary key's index.

    We didn't test to see if pulling the bigint pk into kettle as an integer resulted in a type mismatch, but we couldn't find anything stating the value range for Kettle's integer type, so it seemed a moot point as we couldn't assume it would handle the range of postgres' bigint.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.