Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Best way to merge data

  1. #1
    Join Date
    Jun 2010
    Posts
    114

    Default Best way to merge data

    I have a 1000 rows in excel file and need to upload the data into a table A while generating primary key value.
    To do that, I need to figure out the maximum primary key value on table A and then create a sequence number(maximum id +1) depending on that for each row.

    What is the best Joint possible in this scenario?

    I have tried Table input step - To find the maximum primary key value
    + used Set variables to store the value of maximum primary key
    + used Join Rows(Cartesian product) to use this max key value with the Excel input step
    + used Add Sequence step to grab the maximum primary key assigned in Set variables
    Please find the attached screenshot

    Is this the correct way? Is there a better way to do this?
    Attached Images Attached Images  
    Thanks,
    Om
    ----------------------
    Spoon version -4.4.0

  2. #2

    Default

    Om

    Normally people let their DB generate their primary keys, are you sure you can't do this? e.g. use a PostgreSQL SERIAL data type for the field and let the DB populate it for each insert. Then you can use the Table Output step which can even return the value of the generated key for you.

    But if your DB can't generate automatically incrementing primary keys, then something similar to your approach may work (but you'll need to wrap it in a job to control the flow). Your job should call 2 transformations, the first of which will use Table Input and Set Variables to store the value of the maximum primary key in a job variable, e.g. YOUR_MAX_KEY. Then the second transformation can get your data from Excel and use the Add Sequence step with ${YOUR_MAX_KEY} for the "Start at value" step property, followed by a Table Output (remembering to specify the primary key field).

    Have fun!
    Tony, Cloud2Land.com

    Get the latest snapshot Pentaho Data Integration (Kettle) builds from SVN on my blog!
    http://www.cloud2land.com/

  3. #3
    Join Date
    Jun 2010
    Posts
    114

    Default

    Thanks for the quick reply Tony
    Thanks,
    Om
    ----------------------
    Spoon version -4.4.0

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Alternate way (if you cannot use the DB to generate key), would be to do
    Input 1 (Excel File)
    Input 2 (Select Max(key) ...)

    Cartesian Join of Input 1 and Input 2 (Adds Max(key) to every row)

    Sequence (add increment to stream)

    Calculator ( NewPK = Max(key) + Sequence )
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  5. #5
    Join Date
    Jun 2010
    Posts
    114

    Default

    awesome...
    I had a little confusion regarding Cartesian Join but your reply cleared it
    Thanks Gutlez
    Thanks,
    Om
    ----------------------
    Spoon version -4.4.0

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.