Let's say that need to generate/handle/validate some data(file names, dates, times) in separate steps before passing it to the core ETL transformation.
The job has the following steps/transformations:
Start --> T1 (Generate Result Rows) --> T2 (Validate Result Rows) --> core ETL --> Success
T1 generates one or more result rows.
T2 validates the result rows, let's say validate on date-time. Only validated rows are passed to result rows.
core ETL then uses the result rows.
The problem is if T2 receives result rows that are not validated, the core ETL receives all result rows from T1 instead of T2! How can one avoid this? How to delete all result rows based on logic in a transformation?
Result Rows is an enigma, highly useful, but hardly documented at all and very difficult to debug. It's just a mystery to me how result rows works, what steps/jobs it works between, when it is incremented or replaced.
I mean, result rows is the only way to pass lists inside the program, yet it is confusing to try and understand how it works and what the scope of the list is. Why hasn't this been documented?