PDA

View Full Version : Schema migrations?



avitous
01-23-2006, 11:56 AM
What schema migration capabilities are present in Kettle? I currently have a need to migrate a number of Sybase tables to MySQL, and the default assumptions made for the target table structures are less than ideal, e.g. boolean ("BIT") data types in Sybase are assumed to map to "VARCHAR(1)", but I would prefer to use "BIT" (synonym for "TINYINT(1)" in MySQL). I'd also like for the schemas themselves to be represented in the repository so the mapping would automagically be used if a different (empty) database target is used to execute the transformation.


I tried using a 'Select rows' step between the source table read and the target table write steps, specifying 'BOOLEAN' for the target write for the affected fields (in the 'metadata' tab), but the SQL generated for the target table still specifies "VARCHAR(1)".



If I modify the SQL for the target table to specify the desired "BIT" type, and start over with an empty target database (running the modified SQL to create target tables), it appears to work well; however, I'd like better repository-based management of schema migrations.



Am I missing something here? Is this something the 'Mapping' sub-transformation steps can accomplish for me? (I didn't see them described in the Spoon manual.)



I myself can of course manage the target schemas externally, but other potential users where I work will be looking for "point-and-click migration" functionality. We're evaluating SQLWays, which seems to work well for such tasks, but it's commercial, Windows-only (blecchh) and doesn't appear to support a separate job execution agent with any kind of sophisticated job management. Kettle has additional features we would like and is cross-platform and free.

MattCasters
01-23-2006, 12:27 PM
Hi Avitous,

You are not missing anything. I just missed the BIT datatype possibility in MySQL.
You see, most databases don't support booleans, so I usually map this to a Y/N char(1) or varchar(1).

I have to say that Kettle was designed to construct data warehouses, so in this context, it makes sense, but I guess schema migrations should be possible too ;-)

As for the second part of the question: did you try the table-copy wizard in Chef?
You would need a repository on MySQL or some other database, but it should allow you to do what you want:
Create / alter table, copy data, etc. The wizard generates the job for you to migrate the selected tables.
Let me know if you have any problems with that.

OK, I added the BIT data type to MySQL and added the supportsBooleanDataType() to the DatabaseInterface as well as the setValue() method in Database.

Grab the latest kettle.jar if you want to test this.

>Windows-only (blecchh)

How is Kettle holding up on your platform? (Mac, Linux, Solaris?) I know of some problems on Linux with data grids. That's one of the reasons why I'm re-writing the TableView widget.
Also, please be a bit patient if you want more complex database features such as manual changes to the output and input datatypes. Rest assured, it's coming in 3.0 later this year.
You see, it's not easy to find the right mix between user-friendly and configurable.

All the best,

Matt

P.S. I'm updating the manual for the 2.2.2 release this weekend.

avitous
01-23-2006, 12:58 PM
Way cool; thanks for the quick reply.

I should have specified 'BOOLEAN' instead of 'BIT', but both equate to the TINYINT(1) datatype in MySQL. BIT defaults to size of 1 (boolean value) but supports bitfields up to 64 bits in width.

I tried the 'Copy Table' wizard from Spoon, but it appeared to use the VARCHAR(1) datatype; this was with the 2.2.1 release of Kettle (before your change). Being very new to Kettle I haven't played around with anything but Spoon yet. I'll try it out with your changes shortly.

The other feature I'd want from a 'schema migration' is support for indexes (including primary keys), constraints and default values, as we want the target schema to match the source as closely as possible. Mostly we want to use Kettle as a migration tool with support for periodic synchronization from the Sybase source; with the initial number of affected tables I can manage the target schema definitions externally, though eventually (later this year) the number of tables will grow into hundreds spanning several databases with the intention of (eventually) dropping Sybase from the mix.

I've been running Kettle on Windows, but will be installing to Linux shortly; thus far it appears quite solid. Linux deployment will be mainly for the 'pan' engine and scheduled transformations (synchronizations); I'll be playing with the Spoon frontend from Linux as well but will have Windows as a fallback if issues appear with the GUI.

And no, this wasn't an "immediate feature demand"; I/we can certainly be patient given that it's (a) free software, and (b) already a very useful tool!

Thanks again,
Anderson

avitous
01-23-2006, 01:01 PM
I should have pointed out my deployment environment will be a mix of Windows XP and Linux (RedHat EL4) desktops and RHEL4/x86_64 servers.

Other migration tools tend to be Windows-based using ODBC, and we've found some issues with larger tables (> 8M rows tends to be a problem.)

MattCasters
01-23-2006, 01:03 PM
Well, Kettle uses SWT for the GUI part, and I had some trouble getting it to run on RHEL3/x86_64 last week.
I'm waiting for SWT 3.2 to try further. It still seems to be too bleeding edge at the moment.

Large data sets should not be a problem with Kettle, if the rollback of the source system can handle it that is ;-)

Cheers,

Matt

MattCasters
01-23-2006, 01:07 PM
Yeah, I know about the indexes, but it's actually (IMHO) a messy part of the JDBC story.
What I want to do is create new index-detectors as most JDBC drivers fail miserably at giving lists of indexes.
(don't ask, you don't want to know :-() I already do this for Oracle and a couple of other databases.

Same story for constraints etc. Over time these features will be added to Kettle, but at the moment, there are other priorities in the feature set.

What I suggest is to use Chef to "script" the migration and adjust the SQL generated by Kettle for the target platform. You can enter complete SQL scripts to do so.
That should work fine until better things pop up.

All the best,

Matt