PDA

View Full Version : DynamoDB: Time Dimension table with MERGE



ngoodman
11-21-2009, 05:52 AM
So, even with my disclaimer note on the last blog, the DynamoDB developers slapped me around a bit for suggesting using a view for a Time Dimension (http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-built-in-time-dimension-support/). The Time Dimension is the most important dimension table and should be an actual table, not a view. Creating the table allows us to perform all kinds of optimizations like star joins, bitmap indexes on attributes, etc. Probably wouldn’t be that big of a deal for a tiny fact table (< 5million records) but you’ll want CREATE TABLE if you want good performance.

Good news is, that we can use our exact same table function (with fiscal year offset)


select * from table(applib.fiscal_time_dimension (2000, 1, 1, 2009, 12, 31, 3))

to populate and keep our Time Dimension TABLE up to date.

If you use a TABLE, it’s 2 steps:



CREATE TABLE : “dim_time”
POPULATE TABLE : “merge into dim_time”

We’ll be using another great tool in the DynamoDB / LucidDB toolkit, the MERGE (http://pub.eigenbase.org/wiki/LucidDbUpsert) statement. The MERGE (http://pub.eigenbase.org/wiki/LucidDbUpsert) statement is a logical UPSERT. It checks to see if key is already present. If it is, we UPDATE the table. If it isn’t, we INSERT it into the table. I’ll go into more detail at some point in the future as MERGE is crucial for keeping dimensions up to date.

Let’s create our Time Dimension table:

create table dim_time ( FISCAL_YEAR_END_DATE DATE , FISCAL_YEAR_START_DATE DATE ... ABBREVIATED ... , TIME_KEY DATE , TIME_KEY_SEQ INTEGER ,constraint dim_time_pk primary key (day_from_julian)); NOTE: We’re abbreviated the statements, but all the columns are used in the actual scripts. We also should add bitmap indexes on YR, MONTH, etc columns.

We’ve now got a TABLE that matches the VIEW we created in the previous blog. We’ve made day_from_julian as our PK, and we’ll use this date as our key for the MERGE statement. We can run this query as many times as we like and it will always just keep our “dim_time” table up to date.

merge into dim_time using (select * from table(applib.fiscal_time_dimension (2000, 1, 1, 2010, 12, 31, 3))) src on dim_time.day_from_julian = src.day_from_julian when matched then UPDATE set FISCAL_YEAR_END_DATE=src.FISCAL_YEAR_END_DATE ,FISCAL_YEAR_START_DATE=src.FISCAL_YEAR_START_DATE ... ABBREVIATED ... ,TIME_KEY=src.TIME_KEY ,TIME_KEY_SEQ=src.TIME_KEY_SEQ when not matched then INSERT (FISCAL_YEAR_END_DATE , FISCAL_YEAR_START_DATE ... ABBREVIATED ... , TIME_KEY , TIME_KEY_SEQ) values( src.FISCAL_YEAR_END_DATE , src.FISCAL_YEAR_START_DATE ... ABBREVIATED ... , src.TIME_KEY , src.TIME_KEY_SEQ);

More... (http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-time-dimension-table-with-merge/)