PDA

View Full Version : Datawarehouse



kettle_anonymous
07-28-2006, 03:08 AM
I'm developing a datawarehouse and I'd like to use kettle in order to populate my tables.


I have lange cvs files (30/40MB) with a date attribute. I have tryed to configure spoon for my transformations and now I'm able to populate my db (with some problems).



I have some question about:



1) First. I have never worked with datawarehouse and I have some problems with the "key mappings". In the source database I have some keys in my datawarehouse another keys (ID int4). How can I define my transformations in order to insert/update correctly my data? Where can I find good documentation about this problem?



2) if, in my source database, I have for example 2 fields Field1 (string) and Field2 (string) can I create a Field3 with a concatenation operation:



example: Field1 = cat Field2 = dog --> Field3 = catdog



3) My csv files are very large and incremental files. If I have already inserted the data up to 2006/05/01, I'd like to read only the data with date > 2006/05/01. How can I do this with a csv input file?



Thanks in advance

MattCasters
07-28-2006, 03:17 AM
Well,

I always advice people to use fact tables and slowly changing dimensions for this sort of thing. SFTW for more info.

However, if you just want to load a single table, use the "Insert/Update" step and specify the key to look up with.

You can concatenate with the Calculator step: A+B not only works on numerical data, but also on Strings.
(A*B also works with a number and a string BTW: 3*A = AAA :-))

You can add a Filter after the Text File Input step to limit the dates.

HTH,

Matt

kettle_anonymous
07-28-2006, 06:58 AM
thanks a lot


but.. emh... excuse me for my ignorance: what is SFTW?

kettle_anonymous
07-28-2006, 09:05 AM
It's a typo. Matt meant STFW, which stands for "Search The F****** Web". It's an analogy to the well-known RTFM, which stands for "Read The F****** Manual".