Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Unique rows processes different number of rows on subsequent inputs

  1. #1

    Default Unique rows processes different number of rows on subsequent inputs

    Or rather, subsequent runs with the exact same input.

    I have a transformation that reads a CSV, sorts the records by 3 columns, uniques the them by the same columns, then tries to insert them into the database. When running on the test data set, the transformation fails 50% or more of the time with a uniqueness constraint violation. Whenever I see this error from the database, I also see in the Kettle logs that the uniqueness step did not process all of the records.

    I can replicate this now on two machines, database in identical state, rerunning the exact same transformation and data files. I hate to claim non-deterministic behavior, but literally I just hit up arrow and return until it works. I'll attach the transformation in case anyone can tell me if there's something obvious I'm screwing up.

    Here's some log output on failure, notice that only 14,317 out of 51,276 records were uniqued prior to the database error:

    INFO 09-03 01:54:58,868 - Read Tracking CSV - Line number : 50000
    INFO 09-03 01:54:58,871 - Sort rows - Linenr 50000
    INFO 09-03 01:54:58,875 - Read Tracking CSV - Finished processing (I=51277, O=0, R=0, W=51276, U=0, E=0)
    ERROR 09-03 01:54:58,997 - Insert/update click events - Unexpected error :
    ERROR 09-03 01:54:58,997 - Insert/update click events - org.pentaho.di.core.exception.KettleStepException:
    Error in step, asking everyone to stop because of:

    Error inserting/updating row
    Duplicate entry '8672880-11056881' for key 'report_exacttarget_results_user_id_cell_id_unique'

    org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:307)
    org.pentaho.di.trans.step.RunThread.run(RunThread.java:40)
    java.lang.Thread.run(Unknown Source)

    INFO 09-03 01:54:58,998 - Insert/update click events - Finished processing (I=2, O=1, R=2, W=1, U=0, E=1)
    INFO 09-03 01:54:58,998 - Filter bad fields - Finished processing (I=0, O=0, R=742, W=741, U=0, E=0)
    INFO 09-03 01:54:58,998 - load_daily_result_files - load_daily_result_files
    INFO 09-03 01:54:58,998 - Unique rows - Finished processing (I=0, O=0, R=14317, W=10606, U=0, E=0)
    When it works:
    INFO 08-03 17:55:20,609 - Read Tracking CSV - Finished processing (I=51277, O=0, R=0, W=51276, U=0, E=0)
    INFO 08-03 17:55:21,717 - Sort rows - Finished processing (I=0, O=0, R=51276, W=51276, U=0, E=0)
    INFO 08-03 17:55:21,785 - Unique rows - Linenr 50000
    INFO 08-03 17:55:21,799 - Unique rows - Finished processing (I=0, O=0, R=51276, W=40120, U=0, E=0)
    INFO 08-03 17:56:00,862 - Filter bad fields - Finished processing (I=0, O=0, R=40120, W=40120, U=0, E=0)
    INFO 08-03 17:56:00,864 - Write invalid tracking record to file - Finished processing (I=0, O=1201, R=1200, W=1200, U=0, E=0)
    INFO 08-03 17:56:43,709 - What is the event? - Finished processing (I=0, O=0, R=38920, W=38920, U=0, E=0)
    INFO 08-03 17:56:43,713 - Add clicked constant - Finished processing (I=0, O=0, R=8803, W=8803, U=0, E=0)
    INFO 08-03 17:56:43,714 - Add bounced constant - Finished processing (I=0, O=0, R=1525, W=1525, U=0, E=0)
    INFO 08-03 17:56:44,722 - Insert/update click events - Finished processing (I=8803, O=8796, R=8803, W=8803, U=7, E=0)
    Any hints?
    Thanks,
    David
    Attached Files Attached Files
    Last edited by djantzen; 03-08-2011 at 10:47 PM.

  2. #2

    Default

    It appears that the Unique step was a red herring.

    I've got a workaround by blocking on one Insert/Update step until another one finishes. Together these two steps will sometimes modify the same row, and so it appears in some cases that instead of one inserting and the other updating the record, they both insert. This surprises me since InnoDB is supposed to do row level locking, but I don't know of any other way that the uniqueness constraint could ever by violated when using Insert/Update.

  3. #3
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    It looks like this is meant to be a two-stage piece.
    Would a Job not be more appropriate then?

    Job:
    Transform 1: Tracking
    Transform 2: Sent
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.