View Full Version : IBM DB6
02-07-2006, 08:54 PM
We are building a datamart which imports data from various database sources
- Excel Files
- IBM DB6
We could not find any ODBC or JDBC drivers for DB6. Does Kettle support DB6 connectivity? Is it able to handle all the above datasources?
I would also like to know if how is Kettle in terms of performance and robustness? How does it handle disaster recovery? Basically all newbie queries to get the initial comfort level.
02-07-2006, 10:37 PM
I'm not familiar with DB6, is this some kind of Lotus Notes variant of DB2?
I can't find a lot about it on Google.
If you have a ODBC or JDBC driver then you can use the Generic driver to source the data. If the JDBC driver is freely distributable, consider pointing me to some documentation and we'll add full support for it ;-)
The other data sources you mention are supported although formula result support is a problem in Excel files (being worked on).
In terms of performance, by design, Kettle tries to minimize I/O in favor of CPU based operations. There are some known issues with performance using Scripting (java script) and when using a large number of steps (depends on # of CPU's you have) in a single transformation. Both issues are being addressed in version 3.0.
Generally speaking, in my experience, most of the time, the database is the limiting factor for the performance of Kettle. As CPU's get faster and more are available, this is unlikely to change in the coming years.
Disaster recovery of data warehouses with some kind of point-in-time recovery of loaded transformations is not (yet) supported. The things I've seen in other tools on this subject doesn't really *inspire* me to go that route. ;-)
Here is advice that should work for any ETL tool, although it can hardly be described as advice to a newby :-)
1) Generate a batch ID for every monthly, nightly or hourly job. (keep in a table somewhere)
2) store this batch in the fact tables and dimensions to keep track of what has changed
3) SC Dimension updates are re-startable anyway (nothing will get changed)
4) Delete values from the fact tables with a certain batch ID before the load (SQL step). This makes the transformation re-startable.
5) you can use Chef to design jobs that re-start on failure.
The next majar version of Kettle (3.0) should have support for clustered databases using HA-JDBC. That should give you guaranteed uptime and load balancing if your clients support this.
As far as the meta-data is converned, you can automate exports of the repository using the Kettle tool pan using the /exprep=filename.xml option.
Hope this helps!
02-07-2006, 11:37 PM
I have downloaded Kettle and now trying it out.
The excel files don't have any formula, so no issue on that. But for DB6 even I could not find anything useful on google. It may work with a DB2 driver. We will try.
We need to perfrom the transformations at least twice a week. Is there any task scheduling function in Kettle?
Our source tables and destination tables have different structures. Is there any column mapping feature?
Is kettle able to handle double byte data? (Chinese, Korean)
Thanks for the great tool and the answers.
02-07-2006, 11:47 PM
Task scheduling (Waiter) is scheduled (pun intended) for version 3.0 and will ride on top of an application server like JBoss. In the mean time, look at the Pan documentation (pan-2.2.pdf) on how to schedule on Windows and/or Unix systems.
>Our source tables and destination tables have different structures. Is there any column mapping feature?
Yes, of course. For example you can calculate new values or select only certain values, do lookups, etc.
I suggest putting a "Select Values" step before the "Table Output" step to the target database table.
>Is kettle able to handle double byte data? (Chinese, Korean)
Yes, this is confirmed. "Text File Input" and "Text File Output" allow you to specify the Encoding to use, for example UTF-8, UTF-16, any codepage that is available on your system or supported by Java.
Internally, Kettle uses UTF-8, like most if not all Java applications.
Also, efforts are underway (just started) to localize the Kettle GUI and messages into Chinese and other languages.
All the best,
02-08-2006, 12:17 AM
Thanks once again.
Support for double byte was a major consideration for us. Good to know that Kettle has full support for it.
How about Transformations? (mm/dd/yyyy to dd/mm/yy) etc? Is it possible to configure such transformations or do we have to write code?
Hope u dont mind so many qns.
02-08-2006, 11:21 AM
These string 2 date transformations are a standard part of the text file input step.
In fact, you can specify any possible mask there.