Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Distinct Clause issue for MySQL in PDI-CE-5.3

  1. #1
    Join Date
    Mar 2015
    Posts
    190

    Default Distinct Clause issue for MySQL in PDI-CE-5.3

    Hi Sir,Madam,

    I am using PDI-CE-6.0, Java 1.7 64 bit , MySQL database. Trying to load data nearly like 4000000 records from source to target. to avoid duplicate records i kept DISTINCT clause in my extraction query. as per my DBA input when i kept DISTINCT clause then query is taking long time to process the records and CPU usage is going to take 100%.

    we can use distinct clause to avoid duplicate records in SQL queries and alternative solution is we can write sub select queries but my data is 4000000 records so it is not good idea to kept sub select for my joins(why because i am using nearly 12 table left joins)

    What is the alternative DISTINCT clause in PDI ? i hope all most of MySQL database users are facing issue...am i correct.

    below is the process, trying to loading the data. Please suggest me...

    Name:  data.jpg
Views: 31
Size:  7.1 KB

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Have a look at Unique-Rows - sorted input is required!
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Mar 2015
    Posts
    190

    Default

    Marabu Sir,

    Modified my .ktr using samples in data integration folder. i attached my samples(.ktr) could you please check, whether i kept logic correct or not? why because i confused about your statement "- sorted input is required" but i wasn't find sort info in Unique rows step.

    is Unique rows step process like (My actual result will go to Table output and duplicate records find out then duplicate records will go dummy step,is it correct ?
    Please help me Sir, correct me if i am doing wrong, don't mind please sir.

    Name:  duplicate.png
Views: 34
Size:  15.3 KB
    Attached Files Attached Files

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I wonder about that SQL statement that gave your DBA so much trouble.
    But let's concentrate on the Unique-Rows step. If you didn't suppress all the helpful warnings in Spoon, you should have seen the following dialogue after closing the step settings dialogue:

    Name:  UR-WARN.jpg
Views: 32
Size:  13.3 KB

    So much for the sort requirement. If you know how Unique-Rows works on an input stream, you don't need the warning, but there it is.
    You could use an ORDER-BY clause in your Table-Input step, but it's probably wiser to use a Sort-Rows step in front of the Unique-Rows step, given the strained relationship with your DBA by now.
    Are you sure you need to compare each and every field in your set of 25?
    Why are there duplicates in the first place?
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Mar 2015
    Posts
    190

    Default

    Thank you very much Marabu Sir, I tried with minimal records with "Unique rows" and "sort rows" steps which are having duplicate records, working fine(i am not getting duplicate issues) BUT missing few records if i use Data movement as Round robin option, If i use Data movement as copy data to next steps then getting all records.

    as per PDI functionality when i use Data movement as Round robin option then data is distributing equally.

    Name:  un.jpg
Views: 29
Size:  13.5 KB ________________________ Name:  copy.jpg
Views: 25
Size:  7.7 KB

    Will work on my huge data(whatever i have attached .ktr in previous link) and update the status.


    Quote Originally Posted by marabu View Post
    .
    Are you sure you need to compare each and every field in your set of 25?
    No Sir, I no need to compare all 25 but only one columns is " prdsld " this is primary key in my table. i should not get duplicate row for this but when i apply left join to other tables then i am getting duplicates. as per MySQL forum member suggestion there are few bugs in MySQL database which are related to joins and need to fix in future releases. below is the link. and i have verified in bugs.mysql.com

    http://forums.mysql.com/read.php?24,...941#msg-636941


    Quote Originally Posted by marabu View Post
    Why are there duplicates in the first place?
    Having duplicates because of left joins.
    Last edited by ranala; 11-14-2015 at 12:05 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.