I have a transformation that requires loading over 800,000 merchants and getting all of their transactions for the last 3 years. This is going to end up being tens of millions of rows. I have sorting, grouping and denormalizing going on during the process and storing summary data in an output table. The number of rows output is significantly lower because it is meant to just summarize the data.

I have been having issues with the table output step. The transformation itself takes about 1 hour and 30 minutes to get to the output step. When it does finally get there I continuously get a connection error and the transformations gets stopped. I am not sure if this has something to do with all steps running in parallel but I see in the logs that once the transformation begins it initiates the connection for table output. The database is a Postgres database and I am not sure what the timeout limit is set to but could this error be due to the amount of time it takes for rows to get to the output step? If so, is there a way to not initiate a connection for table output until the rows are closer to the end of the process?

Any help with this matter would be extremely appreciated! Please let me know if any other information is needed from me to help resolve this matter. Thank you.