Hitachi Vantara Pentaho Community Forums
Results 1 to 17 of 17

Thread: Delete step or Execute SQL script

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Mar 2015
    Posts
    190

    Default Delete step or Execute SQL script

    Hi ,

    i am using PDI 6.1,MySQL, Java 1.7

    I am using Table Input --> Delete step to delete the data using indexed columns comparisons. very slow @ Delete step, i have 2000000 records which need to compare and delete. i am not getting where i am doing mistake

    could you please tell me, is delete step or Execute SQL script which one is best approach.


    Name:  Delete step.jpg
Views: 1766
Size:  7.2 KB
    Attached Images Attached Images  

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    You know that you're going to be doing 2M Delete statements, right?
    It would be much easier on your DB if you know that you can delete where ID between A and B.

  3. #3
    Join Date
    Mar 2015
    Posts
    190

    Default

    Quote Originally Posted by gutlez View Post
    You know that you're going to be doing 2M Delete statements, right?
    Yes gutlez, i am tring to delete 2M Delete statements but that is not every time, depends on incremental data loads.

    Quote Originally Posted by gutlez View Post
    It would be much easier on your DB if you know that you can delete where ID between A and B.
    yes you are correct, it is easy from DB side but we need to take care in Dev,Preprod , production enviorments and also for difference customer projects. so we are expecting generic solution sir.

  4. #4
    Join Date
    Feb 2017
    Posts
    13

    Default

    From ETL side
    *I also face this sort of issues and I try to boot the performance by breaking them into different independent datasets. In the table input step, you need to group them into 4 or more groups using a field and in next step break the dataset into parts using the filter row and use n number of delete components.
    * Don't break into too many subsets/datasets
    * Make sure all your datasets/subsets are independent which means key value is not repeating
    * Launch multiple copies (I recommend 2 as we are also using multiple delete steps)
    * If you can write a delete statement in the table input and executing it use execute SQL with the above approach it will boot up the seed much better

    From Database side
    *Tune your DB server delete rows limitations if any
    *Use table partition may also help
    *Improve your DB HA
    Last edited by rsdp; 10-05-2017 at 02:54 AM.

  5. #5
    Join Date
    Mar 2015
    Posts
    190

    Default

    Hi rsdp ,

    Could you please post your's sample transformation, it will help us to resolve my issue asap.

    I am looking into your steps.
    hope you are already succeeded on this issue , so can you modify the your genuine table names and columns names and please post

    (i just want to see logic for
    group them into 4 or more groups using a field and in next step break the dataset into parts using the filter row and use n number of delete components.

    thanks for your support
    ra nala

    Last edited by ranala; 10-05-2017 at 03:58 AM.

  6. #6
    Join Date
    Feb 2017
    Posts
    13

    Default

    Quote Originally Posted by ranala View Post
    Hi rsdp ,

    Could you please post your's sample transformation, it will help us to resolve my issue asap.

    I am looking into your steps.
    hope you are already succeeded on this issue , so can you modify the your genuine table names and columns names and please post

    (i just want to see logic for
    group them into 4 or more groups using a field and in next step break the dataset into parts using the filter row and use n number of delete components.

    thanks for your support
    ra nala


    Ranala, try to build a job in your environment and give me the output of it so that I can help you as you are trying to do 2M it not a simple process. Use the below logic and let me know if you face any issue.

    In Table input step 1
    select concat('delete from ',table_name,'where id = ',id,';') , quater_of_year from table_name
    where quater_of_year = q1

    In Table input step 2
    select concat('delete from ',table_name,'where id = ',id,';') , quater_of_year from table_name
    where quater_of_year = q2

    In Table input step 3
    select concat('delete from ',table_name,'where id = ',id,';') , quater_of_year from table_name
    where quater_of_year = q3

    In Table input step 4
    select concat('delete from ',table_name,'where id = ',id,';') , quater_of_year from table_name
    where quater_of_year = q4

    Your single KTR sholud look like
    table/input1 -------> excute-sql for each row with 2 copies (increase the commit size );
    table/input2 -------> excute-sql for each row with 2 copies (increase the commit size );
    table/input3 -------> excute-sql for each row with 2 copies (increase the commit size );
    table/input4 -------> excute-sql for each row with 2 copies (increase the commit size );


    Note:
    * I am assuming that 4 sets of data are independent
    * Field to break the datasets depends on your data in my case I am assuming the quarter of the year.
    Last edited by rsdp; 10-06-2017 at 02:18 AM.

  7. #7
    Join Date
    May 2014
    Posts
    358

    Default

    What is the speed (rows per second) of the Delete step when you run it? You probably won't get much more than 1000-3000 r/s, but that could be sufficient if you have just a few million rows to delete.
    Is there a composite index with all the keys on the table?

  8. #8
    Join Date
    Mar 2015
    Posts
    190

    Default

    No Sir, we dont have composite index with all the keys on the table.

    If i have composite index then can I delete indexes in individual column level ?

    Thanks for your's support Lukfi

  9. #9
    Join Date
    Sep 2011
    Posts
    152

    Default

    you can create all delete statement and put it in file and run that file or run queries from that file. that will improve your delete operation.

    Also from db side you have change your configuration such as in case of mysql innodb_buffer_pool_size=2GB.

  10. #10
    Join Date
    Mar 2015
    Posts
    190

    Default

    Hi rajeshbcrec ,
    as ai said in my previous post, we are not hard coding delete statements. thats is part of incremental and delete the data.


    Thank you

  11. #11
    Join Date
    Sep 2011
    Posts
    152

    Default

    I am not asking to hardcode the delete statement , you can generate delete statement on fly or dynamically and save it in file and execute that file.
    just you have hardcode syntax of delete statement, and replace all the condition either using metadata injection or any method.

  12. #12
    Join Date
    Mar 2015
    Posts
    190

    Default

    Hi Rajesh,


    it is old chain, but i now tested thoroughly from my end. not able attach any images not sure why ? please help on this i will post my images also to understand the issue to explain properly


    case 1: tested with delete statements, i am generating the delete statements using sql query and passing it to the execute sql script step and it took 1 minute 38 seconds
    [IMG]C:\Users\rna\Desktop\execute_sql_step.png[/IMG]

    case 2: tested with delete step, and it took 32.3 seconds

    [IMG]C:\Users\rna\Desktop\delete step1.png[/IMG]

    hope second step is very fast as compared to 1st one.

    Please let me know your suggestions. hope i followed your suggestions but there is no luck

    Thank you

  13. #13
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    You never explained why you need to do a row-by-row delete.

    If you are doing a Row-by-row delete, you will have 2M round-trips to the database. Even at 2ms round-trip, that is approximately 4000s of network time that you are using. That works out to 66 minutes, or just over an hour.

    Having one index with all of the columns that you are using to identify the row to be deleted will also impact the delete performance.

    Finally, your post doesn't make any logical sense. First you state that 1 is slower than 2, then you say that you hope that 2 is faster than 1. Even though you just stated that it was. Then you also state that there is no luck. What are you trying to say? That the rows aren't even deleted?

  14. #14
    Join Date
    Aug 2016
    Posts
    290

    Default

    Have you tried turning on batched queries? Sending large batches of queries in one go can give great performance boost over round trip time per single query.
    Quote Originally Posted by rajeshbcrec View Post
    you can create all delete statement and put it in file and run that file or run queries from that file. that will improve your delete operation.
    Why would the queries from file improve performance compared to queries from memory? I'm asking because, from a server perspective, it doesn't make much sense where the client stores the query.
    Last edited by Sparkles; 06-26-2018 at 02:24 AM.

  15. #15

    Default Commit size to be increased

    Increase the commit size to 10000. It should help you. Try and let us know.


    Quote Originally Posted by ranala View Post
    Hi ,

    i am using PDI 6.1,MySQL, Java 1.7

    I am using Table Input --> Delete step to delete the data using indexed columns comparisons. very slow @ Delete step, i have 2000000 records which need to compare and delete. i am not getting where i am doing mistake

    could you please tell me, is delete step or Execute SQL script which one is best approach.


    Name:  Delete step.jpg
Views: 1766
Size:  7.2 KB
    Regards,
    Dileep
    Mail ID

  16. #16
    Join Date
    Mar 2015
    Posts
    190

    Default

    thank you Dileep. i will test it with commit size.

    and i am not able to attach any images or my sample transformations or jobs to post my issues.

    I tried in youtube and googled it regarding attachments issue, previously i used to add my sample attachments(.ktr, .kjb) but not now. are you getting attachment icon ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.