Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Kettle loses rows

  1. #1

    Default Kettle loses rows

    I have this simple transformation:

    Name:  from_access_to_mysql.jpg
Views: 46
Size:  5.7 KB

    I would like to convert a MS Access DB to a MySQL one.

    In the "table out" step I have set 1000 as "commit size" and I have selected "use batch update for inserts". The Access DB has 500.000 records, but I lose 3% of them during the conversion. Sometimes, I get more than 500.000 rows in the MySQL db.

    I've tryed changing the commit size and disabling the ""use batch update for inserts" option, but I have the same problem.

    Someone may help me?

    Thanks in advance.

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    What does it mean
    but I lose 3% of them
    ?
    Maybe they have invalid data or data incompatible with the fields that you have created in your Mysql table.
    Changing commit size or type of insert does not solve your problem, I guess.

    I would try to identify those lost records/rows and try to understand why they cannot be imported into Mysql
    -- Mick --

  3. #3

    Default

    Thank you for your reply.

    Quote Originally Posted by Mick_data View Post
    What does it mean ?
    I have 500.000 records as input, but i have only 485.000 records written in the MySQL table. This number change every time I run my transformation using "truncate table": I get 486.743, 495.322, 480.655, 510.398 records and so on. Every time I run my trasformation, I get a different number of rows in the output db (sometimes less than the number of rows in the Access DB, sometimes more).

    Quote Originally Posted by Mick_data View Post
    Maybe they have invalid data or data incompatible with the fields that you have created in your Mysql table.
    I don't think so, because the number of records written changes every time.

    Quote Originally Posted by Mick_data View Post
    Changing commit size or type of insert does not solve your problem, I guess.
    No, it doesn't.

    Quote Originally Posted by Mick_data View Post
    I would try to identify those lost records/rows and try to understand why they cannot be imported into Mysql
    It's a good idea. I'll try.

  4. #4

    Default

    I've run the same transformation twice and I've compared the two MySQL tables. The first one has 595749 rows, the second one only 588823. The two tables has the same records, but the smaller simply stop at 588823 rows. The other one from 588824 to the end are missed. It seems that the transformation has stopped before the end, but I'm not able to understand why.

  5. #5
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi.
    Could it be that your Mysql server stops the connection automatically, maybe for some timeout settings?
    Not an expert, some I'm just throwing ideas!

    If I were you, I would split the original data into smaller batches and upload those one at the time instead of trying to load everything at once.
    -- Mick --

  6. #6

    Default

    Do you know how I can split the data using Kettle?

  7. #7
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    You can do it in different ways.
    If you have a rownumber/id, you can filter your data - first from 1 to 100K, then from 101k to 200k and so on.
    I've seen something similar in few posts in this forum, try to do a quick search - if I have time I'll try to find those posts.
    -- Mick --

  8. #8

    Default

    Maybe, as you said, the problem is into the MySQL configuration.

    I've splitted my data using "filter rows". I've converted the first 100.000 rows without any loss of data. The same for the second 200.000, the third and so on.
    When I had 500.000 raws in my MySQL db, I tryed to convert the last rows but I losed the last one.

    I tryed with more than 600.000 rows and there are no problem for the first 580.000.

  9. #9
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    Big question: What version of PDI are you using?
    There's a bug in one of them that can cost you rows when the number of rows exceeds commit size.

  10. #10

    Default

    Thank you for your reply.

    I have used Kettle 5.4.0.1-130 and 5.2.0.0, but I get the same problem.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.