Hitachi Vantara Pentaho Community Forums
Results 1 to 16 of 16

Thread: Input specific fields into DB table

  1. #1
    Join Date
    Nov 2013
    Posts
    24

    Default Input specific fields into DB table

    My input csv file is of this format - customer_id(string), type 1, type 2
    And my DB table person_table is of this format:
    person id(int), person_index(int), customer_id(string), timestamp wherein the person_id is auto generated.

    I used csv input and table output with 'specify DB fields'.

    But I get this error: Error updating row. Field person_index doesnt have a default value.

    Any ideas why?

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Sure: Column person_index is defined as NOT NULL and no default constraint exists, nor does your transformation offer a value.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Nov 2008
    Posts
    777

    Default

    Assuming you have your database table set up to auto generate the id, you need to put the name of the auto-generated field (person_id or person_index?) in the "Name of auto-generated field" text box on the "Main options" tab and then not include it in the list of "Fields to insert" on the "Database fields" tab.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  4. #4
    Join Date
    Nov 2013
    Posts
    24

    Default

    Thanks for the pointers! Looks like there IS a unique constraint on the person_index field. It is not auto generated (the person_id field is), but it would be an incrementing value. Can I use kettle to do this? I want the person_index to run from 1 to N, and if I run the transformation to insert more, I want it to start off at N+1 (not start over at 1).

  5. #5
    Join Date
    Nov 2008
    Posts
    777

    Default

    Yes, it can be done in Kettle. What I would do is have a Table Input step with a SQL statement like the one below in a side branch and then use a Cartesian Join step to append it into every row in your main branch. After that, insert an Add Sequence step and a Calculator step to add the previous_max_person_index to the counter created by the sequence. The result of that calculation will be the new person_index for each row added. Make sense?

    Code:
    SELECT IFNULL(MAX(person_index),0) AS previous_max_person_index FROM person_table;
    Last edited by darrell.nelson; 12-13-2013 at 01:30 PM. Reason: Added IFNULL(,) function to return a value of 0 if the table is empty.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  6. #6
    Join Date
    Nov 2013
    Posts
    24

    Default

    Darrell - it makes complete sense. I am quite new to kettle and am trying to figure how to do the side branch. Right now, the only steps I have are csv_input to table output. Where would I create the branch to include the table input with the custom SQL script you mentioned?

  7. #7
    Join Date
    Nov 2008
    Posts
    777

    Default

    I am envisioning something like this leading up to your Table output step. In the Join Rows (cartesian product) step, select the CSV file input step as the "Main step to read from" and there's no need to enter anything in "The condition" box - the default condition should work fine.

    Name:  cartesion_product.jpg
Views: 42
Size:  16.7 KB
    Last edited by darrell.nelson; 12-13-2013 at 04:06 PM.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  8. #8
    Join Date
    Nov 2013
    Posts
    24

    Default

    Thanks Darrell. That worked fine! However, when I try to insert multiple records for the same customer_id, I get this error: 'Duplicate entry for key person_index_UNIQUE'. How do I get past this problem?

  9. #9
    Join Date
    Nov 2008
    Posts
    777

    Default

    Are all the records coming directly from the CSV file? Or are you duplicating some of the rows and thus reusing the person_index that comes out of the sequence calculation?

    Are you by chance using MySQL Workbench or a similar database tool? If so, can you copy/paste the SQL "CREATE ..." statement for your table into your next post?
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  10. #10
    Join Date
    Nov 2013
    Posts
    24

    Default

    All the records are coming from the .csv file. The .csv file looks like this:

    Customer_id Feature1 Feature2 Feature3...
    Andy xxx yyy
    Bob eee gggg
    Andy rrr 89485

    Yes, we are using MySQL.

    CREATE TABLE personids (
    person_id INT NOT NULL AUTO_INCREMENT,
    person_index INT NOT NULL,
    flag BINARY NOT NULL DEFAULT 0,
    customer_uid VARCHAR(255) NULL,
    stamp_created DATETIME NULL DEFAULT now(),
    stamp_updated DATETIME NULL DEFAULT now(),
    PRIMARY KEY (person_id),
    UNIQUE INDEX person_index_UNIQUE (person_index ASC)
    );

  11. #11
    Join Date
    Nov 2013
    Posts
    24

    Default

    Darrell,

    I am actually getting the 'Duplicate entry for key person_index_UNIQUE' even when I run the transformation multiple times without the input .csv having multiple records for a given customer_uid.

  12. #12
    Join Date
    Nov 2008
    Posts
    777

    Default

    Please add a CSV output step instead of a Table output step so you can see what keys are duplicate. Perhaps your sequence calculation is not functioning as expected.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  13. #13
    Join Date
    Nov 2013
    Posts
    24

    Default

    This is the text file output.

    customer_uid;previous_max_person_index;person_index;person_index_1
    Andy;5;1; 6.0
    Betty;5;2; 7.0
    Carol;5;3; 8.0
    Dan;5;4; 9.0
    Gregor;5;5; 10.0

    I get the same values even when I run the transformation multiple times.

  14. #14
    Join Date
    Nov 2008
    Posts
    777

    Default

    You have a couple of errors in your transformation.
    1. Do you see that your last two fields are person_index and person_index_1? The first person_index field (ranges from 1 to 5) is coming from the Add sequence step. That field should be renamed to counter in the Add sequence step. Then in the Calculator step, the calculation should add counter and previous_max_person_index together. The second person_index field is happening because you are creating two person_index fields in different steps and the second one is being renamed (by Kettle) to person_index_1 so as to avoid a naming collision. Thus, your database is getting the wrong one - it is getting the sequence instead of the calculated value.
    2. Do you also see that person_index_1 is of type Number? Change it to Integer type in your Calculator step.

    When you fix those things, your text file should look like this. Then, try going to the database again.
    Code:
    customer_uid;previous_max_person_index;counter;person_index
    Andy;5;1;6
    Betty;5;2;7
    Carol;5;3;8
    Dan;5;4;9
    Gregor;5;5;10
    Last edited by darrell.nelson; 12-18-2013 at 04:02 PM.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  15. #15
    Join Date
    Nov 2013
    Posts
    24

    Default

    Thanks Darrell. I just realized the conflict too. I created a new variable name and it works now! If I have to read that Auto Increment field person_id from this table and populate another table, I would just have to create another transformation, right?

  16. #16
    Join Date
    Nov 2008
    Posts
    777

    Default

    You should be able to get the Auto Increment field back from the database by entering person_id in the "Return auto-generated key" box on the "Main options" tab of the Table output step.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.