Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 17

Thread: Delete step or Execute SQL script

  1. #1
    Join Date
    Mar 2015
    Posts
    190

    Default Delete step or Execute SQL script

    Hi ,

    i am using PDI 6.1,MySQL, Java 1.7

    I am using Table Input --> Delete step to delete the data using indexed columns comparisons. very slow @ Delete step, i have 2000000 records which need to compare and delete. i am not getting where i am doing mistake

    could you please tell me, is delete step or Execute SQL script which one is best approach.


    Name:  Delete step.jpg
Views: 1697
Size:  7.2 KB
    Attached Images Attached Images  

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    You know that you're going to be doing 2M Delete statements, right?
    It would be much easier on your DB if you know that you can delete where ID between A and B.

  3. #3
    Join Date
    Feb 2017
    Posts
    13

    Default

    From ETL side
    *I also face this sort of issues and I try to boot the performance by breaking them into different independent datasets. In the table input step, you need to group them into 4 or more groups using a field and in next step break the dataset into parts using the filter row and use n number of delete components.
    * Don't break into too many subsets/datasets
    * Make sure all your datasets/subsets are independent which means key value is not repeating
    * Launch multiple copies (I recommend 2 as we are also using multiple delete steps)
    * If you can write a delete statement in the table input and executing it use execute SQL with the above approach it will boot up the seed much better

    From Database side
    *Tune your DB server delete rows limitations if any
    *Use table partition may also help
    *Improve your DB HA
    Last edited by rsdp; 10-05-2017 at 02:54 AM.

  4. #4
    Join Date
    Mar 2015
    Posts
    190

    Default

    Hi rsdp ,

    Could you please post your's sample transformation, it will help us to resolve my issue asap.

    I am looking into your steps.
    hope you are already succeeded on this issue , so can you modify the your genuine table names and columns names and please post

    (i just want to see logic for
    group them into 4 or more groups using a field and in next step break the dataset into parts using the filter row and use n number of delete components.

    thanks for your support
    ra nala

    Last edited by ranala; 10-05-2017 at 03:58 AM.

  5. #5
    Join Date
    Mar 2015
    Posts
    190

    Default

    Quote Originally Posted by gutlez View Post
    You know that you're going to be doing 2M Delete statements, right?
    Yes gutlez, i am tring to delete 2M Delete statements but that is not every time, depends on incremental data loads.

    Quote Originally Posted by gutlez View Post
    It would be much easier on your DB if you know that you can delete where ID between A and B.
    yes you are correct, it is easy from DB side but we need to take care in Dev,Preprod , production enviorments and also for difference customer projects. so we are expecting generic solution sir.

  6. #6
    Join Date
    Feb 2017
    Posts
    13

    Default

    Quote Originally Posted by ranala View Post
    Hi rsdp ,

    Could you please post your's sample transformation, it will help us to resolve my issue asap.

    I am looking into your steps.
    hope you are already succeeded on this issue , so can you modify the your genuine table names and columns names and please post

    (i just want to see logic for
    group them into 4 or more groups using a field and in next step break the dataset into parts using the filter row and use n number of delete components.

    thanks for your support
    ra nala


    Ranala, try to build a job in your environment and give me the output of it so that I can help you as you are trying to do 2M it not a simple process. Use the below logic and let me know if you face any issue.

    In Table input step 1
    select concat('delete from ',table_name,'where id = ',id,';') , quater_of_year from table_name
    where quater_of_year = q1

    In Table input step 2
    select concat('delete from ',table_name,'where id = ',id,';') , quater_of_year from table_name
    where quater_of_year = q2

    In Table input step 3
    select concat('delete from ',table_name,'where id = ',id,';') , quater_of_year from table_name
    where quater_of_year = q3

    In Table input step 4
    select concat('delete from ',table_name,'where id = ',id,';') , quater_of_year from table_name
    where quater_of_year = q4

    Your single KTR sholud look like
    table/input1 -------> excute-sql for each row with 2 copies (increase the commit size );
    table/input2 -------> excute-sql for each row with 2 copies (increase the commit size );
    table/input3 -------> excute-sql for each row with 2 copies (increase the commit size );
    table/input4 -------> excute-sql for each row with 2 copies (increase the commit size );


    Note:
    * I am assuming that 4 sets of data are independent
    * Field to break the datasets depends on your data in my case I am assuming the quarter of the year.
    Last edited by rsdp; 10-06-2017 at 02:18 AM.

  7. #7
    Join Date
    May 2014
    Posts
    358

    Default

    What is the speed (rows per second) of the Delete step when you run it? You probably won't get much more than 1000-3000 r/s, but that could be sufficient if you have just a few million rows to delete.
    Is there a composite index with all the keys on the table?

  8. #8
    Join Date
    Mar 2015
    Posts
    190

    Default

    No Sir, we dont have composite index with all the keys on the table.

    If i have composite index then can I delete indexes in individual column level ?

    Thanks for your's support Lukfi

  9. #9
    Join Date
    Sep 2011
    Posts
    152

    Default

    you can create all delete statement and put it in file and run that file or run queries from that file. that will improve your delete operation.

    Also from db side you have change your configuration such as in case of mysql innodb_buffer_pool_size=2GB.

  10. #10
    Join Date
    Mar 2015
    Posts
    190

    Default

    Hi rajeshbcrec ,
    as ai said in my previous post, we are not hard coding delete statements. thats is part of incremental and delete the data.


    Thank you

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.