Hi there--

I'm reading some literature on ETL/ Data Warehousing best practices in general --- you can definitely fill a library.

Anyway, it's highly recommended to create a surrogate key/ dummy primary key for main fact tables (aka list of measures, numbers, etc ... as opposed to dimension tables which are mappings/ descriptions).

It's also recommended that this surrogate key is not a natural key + timestamp, or whatever. That's what I'm currently doing, so it looks like this author knows what he's talking about.



Anyway I'm trying to visualize how to actually a achieve this in Pentaho. Would it be best to look up the max surrogate key ID in the current table (say its 193832), and just add one, and increment from there during each load/ write of each row?

Or should there just be a universal Pentaho variable for it that is incremented by one every time the job is ran?

I'm just trying to visualize what will be most useful in the event of an ETL failure/ error/ rollback, etc. What's the easiest/ most straightforward way to generate surrogate keys in Pentaho?