I seem to be losing my fields after an Execute SQL Script. I have a transformation that takes data from a csv file, does a few things, and then does an insert/update on a Postgres table. The transformation works fine without the Execute SQL Script step.

But I would like to add this step in order to delete data from the last couple of months before I do the insert/update. Recent data is very volatile, and sometimes rows of data (which are transactions) are entirely deleted from the operational data set. In this case, an insert/update would leave the old data, even if I run a new report for the same time period.

So that's why I want to run a simple delete operation before loading the data. Here is my sql:

Code:
delete from transactioninfo where ship_date_key >= (select ship_date_key from ship_date_dimension where ship_date_dimension.ship_sql_date_stamp = to_date(extract(yearfrom current_date - interval'2 months')||'-'||extract(monthfrom current_date - interval'2 months')||'-01', 'YYYY-MM-DD'));


I learned that it is important to not include carraige returns (at least for connecting to Postgres). So the above code does not include any carriage returns as it is typed into the Execute SQL Script step.

What happens is that steps following the Execute SQL Script step complain about not being able to find fields that are present earlier in the stream. These fields *do* show up when clicking "Show output fields" for the Execute SQL Script and "Show input fields" for the following steps. So it would seem that these fields should be visible. And the fields are visible if I remove the Execute SQL Script.

Here are two cases that I tried:

1. At the beginning of my transformation, right after my Input from CSV step. (I noticed that it was easier to specify the file for the CSV step if it is the first step.) I do the deletion right after the CSV step and right before the next step, which is a data validator step that begins the transformation proper. I get an error that the data validator cannot find the field "transaction_id" (which is the field that is being validated).

The data validator step has no problem seeing this field when the Execute SQL Step is removed.

2. At the end of my transformation, right before my Insert /Update Step.

Here is the specifics from the last case:
2009/06/22 11:42:10 - Insert / Update transactioninfo table.0 - ERROR (version 4.0.0-M1, build 10666 from 2009-06-20 12.36.34 by tomcat) : Unexpected error :
2009/06/22 11:42:10 - Insert / Update transactioninfo table.0 - ERROR (version 4.0.0-M1, build 10666 from 2009-06-20 12.36.34 by tomcat) : org.pentaho.di.core.exception.KettleStepException:
2009/06/22 11:42:10 - Insert / Update transactioninfo table.0 - ERROR (version 4.0.0-M1, build 10666 from 2009-06-20 12.36.34 by tomcat) : Field [transaction_id] is required and couldn't be found!

Insert/Update is using transaction_id as the lookup field. It complains that it can't find the field, but this step does show transaction_id as an input field, and everything works fine without the Execute SQL Step.

I've also tried editing the Insert / Update step by clicking on 'clear and add all' for both the lookup field and update field sections (and then deleting the fields from the lookup section that I don't need). So the Insert/ Update step setup screen is able to see these incoming fields (including 'transaction_id').

What could I try?

Thanks for your time,
Matt

P.S.
I originally tried this with PDI 3.2.0-RC1, which has been the version I've used since it was released. This was build version 2009-04-27 10.36.25. The same results occur using 4.0.0-M1, build version 2009-06-20 12.36.34.

I am using PostgreSQL 8.3.5, compiled by Visual C++ build 1400.