matt_mcgill
03-10-2006, 09:24 AM
It appears that the Insert / Update task does not function the way I would expect it to, so I will explain my attempted usage here and hope that someone can point out to me the correct way to carry out the operation I'm attempting. Note that I'm doing this as an exercise, it's not actually part of an ETL that's being used in a production environment.
I have a source table containing data about all employees that have worked for a company, named <code>employee_history</code>. Each employee is identified by a 'pidm'. I want to maintain a list of active employees in a table on a separate database (we'll call that table <code>current_employees</code>) The <code>current_employees</code> table has an auto-generated primary key named id, which is different from pidm. When new employees show up in <code>employee_history</code>, I want my ETL to insert records in <code>current_employee</code> for those employees. When an existing employee's name changes in <code>employee_history</code>, I want the ETL to update the corresponding record in <code>current_employees</code> with the new name. Seems to me like the Insert / Update task is what I should be using.
So I come up with a transformation consisting of the following three tasks:
<pre>
Table input ----->----- Add sequence ----->----- Insert / Update
</pre>
The table input task SELECTs the pidm, firstname, and lastname fields from <code>employee_history</code> where the status field indicates that the employee is active. The Add sequence task uses a sequence defined in the database (to ensure uniqueness across multiple runs of the ETL) to add an 'id' field to each looked-up record. The Insert / Update field uses 'pidm' as the lookup key, and uses firstname and lastname as the update fields. The id field should not be an update field, because it is the primary key on <code>current_employees</code>. But it should obviously be included in an insert statement when no existing record for the given pidm is found. (Ignore the fact that employees which are deleted in <code>employee_history</code> will not be deleted in <code>current_employee</code>).
That seems to make sense to me, but the Insert / Update task doesn't appear to work that way. When I tried the above scenario, the very first insert failed because only the update fields were included in the insert. I don't understand the reason for this. If I were to replace the Insert / Update task with a Table output task, all of the fields would be automatically included in an insert. I would have to explicitly filter them out if I had fields which I did not want to be inserted. Why is it that this is not the case with Insert / Update? The way it appears to be implemented, every single field in the target table which has a NOT NULL constraint would have to be listed as an update field, or no Insert would ever actually succeed.
Am I just confused? I know that what I'm trying to do doesn't make a whole lot of sense, but it seems like I should be able to do it.
Confused,
-Matt McGill
I have a source table containing data about all employees that have worked for a company, named <code>employee_history</code>. Each employee is identified by a 'pidm'. I want to maintain a list of active employees in a table on a separate database (we'll call that table <code>current_employees</code>) The <code>current_employees</code> table has an auto-generated primary key named id, which is different from pidm. When new employees show up in <code>employee_history</code>, I want my ETL to insert records in <code>current_employee</code> for those employees. When an existing employee's name changes in <code>employee_history</code>, I want the ETL to update the corresponding record in <code>current_employees</code> with the new name. Seems to me like the Insert / Update task is what I should be using.
So I come up with a transformation consisting of the following three tasks:
<pre>
Table input ----->----- Add sequence ----->----- Insert / Update
</pre>
The table input task SELECTs the pidm, firstname, and lastname fields from <code>employee_history</code> where the status field indicates that the employee is active. The Add sequence task uses a sequence defined in the database (to ensure uniqueness across multiple runs of the ETL) to add an 'id' field to each looked-up record. The Insert / Update field uses 'pidm' as the lookup key, and uses firstname and lastname as the update fields. The id field should not be an update field, because it is the primary key on <code>current_employees</code>. But it should obviously be included in an insert statement when no existing record for the given pidm is found. (Ignore the fact that employees which are deleted in <code>employee_history</code> will not be deleted in <code>current_employee</code>).
That seems to make sense to me, but the Insert / Update task doesn't appear to work that way. When I tried the above scenario, the very first insert failed because only the update fields were included in the insert. I don't understand the reason for this. If I were to replace the Insert / Update task with a Table output task, all of the fields would be automatically included in an insert. I would have to explicitly filter them out if I had fields which I did not want to be inserted. Why is it that this is not the case with Insert / Update? The way it appears to be implemented, every single field in the target table which has a NOT NULL constraint would have to be listed as an update field, or no Insert would ever actually succeed.
Am I just confused? I know that what I'm trying to do doesn't make a whole lot of sense, but it seems like I should be able to do it.
Confused,
-Matt McGill