PDA

View Full Version : The Weekly Kettle Tip: Quick Kettle Pointers



MattCasters
04-14-2006, 05:30 AM
This weeks Kettle tip was not written by me or one of the Kettle developers. It comes from people in Daytona Beach (FL) that have been using Kettle for a while now. I'm reposting their experiences and pointers unchanged (with permission) in the hope that it will help others.
------------------------------------------------------
Pointers using Kettle.
Setting
We were new to using Kettle and want to use kettle as an ETL tool to load a
datawarehouse from 50 different sources.

Mapping
1st misconception - Assumed mapping meant mapping one field to another (i.e.
pulling data from a source table and mapping it to a destination table).

To peform this type of Mapping use the Select Value Step (this allows you to
extract data from source and pass it on to the successor step (i.e. map
input field name to output field name)).
Constant values
Use of Constant Step - allows you to map a fixed value to an output step
(i.e. say you have a field STATE and you want to put FL in each record in
the destination table, you would use a constant step).

In the constant step you can map many constants/fixed values with one step.
Table Output
Not all field in the destination table have to have a value put into them.

The tool may give you warnings (in the validation step), but its not
necessary to have every field in the destination have a value being passed
it.
Commit size
To improve performance of transformations, add the value to the commit size
in the table output step.


Example:



If commit size is specified an appropriate size in the "Table Output" step,
this will cause major boost to performance of a transformation.

We were testing with 5000 records, using the default size of 0, it took 160 seconds.

Increasing the size to 1000 made the process run in 3.5 seconds.



------------------------------------------------------


I hope you found these tips useful and if you have others, feel free to post them here, or send them over to me.



Kind regards,

Matt

kettle_anonymous
07-20-2006, 11:16 PM
Hi Matt, We had a requirement of merging multiple columns into a single column in the output. For instance in the table input, the hour, min and time are stored in separate columns. We need to merge all three together with another column that contains only date and put the result in a single output column. Could you let me know how to go about this? I tried with the Execute SQL Script step with a to_date command, but it didn't work out.
Thanks, Pradeep

kettle_anonymous
07-23-2006, 07:56 PM
Hi Matt,
Thanks for tips.Questions did align with what we have requirements ,but the answers werent understandable.
when there is a need to pull data from muliple datasources(tables present in different kind of databases),what is the
approach a developer is expected to take ?

MattCasters
07-24-2006, 03:35 AM
Just make sure that before mixing the rows from the different sources, the layout of the rows are exactly the same. Use "Select Values" steps to do so.
When in doubt, run in safe mode.

Matt

superdx
06-14-2007, 02:49 AM
Hi, I am really interested to know specifically how Constant Values is done. I am using Kettle 2.50 and I don't see any step called Constant Values. There's an Add constants, but I'm not sure how it works.

Basically I have a stream of data, when finished processing I would like to update all the columns which are blank or null to be of a fixed value that I would predefine (or hardcode, it doesn't really matter). Right now it's trivial to implement with a single SQL statement, but our end users wouldn't be able to accomplish this. It would be best if they could accomplish the same result using the graphical tools provided in Kettle.

Thanks in advance for any help you could provide!

sboden
06-14-2007, 05:06 AM
Use calculator step with the NVL functionality e.g.

Regards,
Sven

superdx
06-14-2007, 07:39 PM
Hi,

Do you mind elaborating? NVL requires two fields. I'm looking to insert a constant. Do I hardcode a value into Field B?

sboden
06-15-2007, 02:06 AM
Naa... thinking too much we follow Oracle in functionality :D

Attached a transformation which I think does what you want using Value mapper. You can preview each of the steps to see how it works.

Regards,
Sven

clement
06-19-2007, 11:04 AM
Hi, I am really interested to know specifically how Constant Values is done. I am using Kettle 2.50 and I don't see any step called Constant Values. There's an Add constants, but I'm not sure how it works.

Basically I have a stream of data, when finished processing I would like to update all the columns which are blank or null to be of a fixed value that I would predefine (or hardcode, it doesn't really matter). Right now it's trivial to implement with a single SQL statement, but our end users wouldn't be able to accomplish this. It would be best if they could accomplish the same result using the graphical tools provided in Kettle.

Thanks in advance for any help you could provide!

Hi,

How "add constants" step works :
Starting with,
-"add constants" step input : 2 columns
And using,
-"add constants" step configuration : name="myNewColumn" type="String" value="Hello"
Will give you,
-"add constants" step output : 3 columns, column "myNewColumn" have all lines with "Hello".

a+, =)
-=clément=-