View Full Version : Why Enhydra Octopus?

06-14-2005, 06:43 PM
Octopus' focus on JDBC is extremely limiting. Dealing with input data of any format - flat file, Excel, XML - from a variety of sources - directory, FTP, email, JMS - seems to be a more realistic requirement. We have been using Babeldoc for a long time, and are happy with it, but there probably are more options out there.


06-16-2005, 02:15 PM
We do not have a strong reliance on a specific ETL / EAI tool. We selected Octopus based on its strengths in RDMBS integration but are not restricted to only one tool. We will evaluate Babeldoc to see if it fills additional needs. Thankyou for the recommendation.

If someone would like to volunteer to contribute an integration wrapper for Babeldoc, that would be great.

James Dixon
Chief Geek

07-12-2005, 05:11 AM
Have you all done any testing with Octopus? I wonder how it compares to commercial tools.

One of the issues I have with current ETL tools is that they are pretty much stand alone. They are tough to integrate and don't readily reuse other technology/logic. For instance I might have a domain/business model already developed and instead of taking raw data and putting it into a db, I would like to use the model instead since it contains the business rules/persistance logic already. Maybe just being able to use an existing business rules engine would be enough.

I think this fits in with your vision. I've posted the following from your Solutions Oriented Approach page

* Often the solution to a business problem is a process that includes Business Intelligence.
* Therefore: the Business Intelligence, by itself, is not the solution to the problem.
* If Business Intelligence is part of the process, the Business Intelligence tools are, unavoidably, also part of the process.
* A Business Intelligence tool that does not understand processes, or how to be part of one, will be hard to integrate into the solution.

07-12-2005, 05:14 AM
If someone would like to volunteer to contribute an integration wrapper for Babeldoc, that would be great

How can we can do things like this? Do we have to wait for the first release? :whistle: Any idea when this month it will happen? :)

07-12-2005, 10:46 AM
We will be releasing the first code at the end of this month. Just keep whistling. There will not be much, if any, Ocotpus integration in the first drop.

Our ETL support will be mainly focused on integrating an ETL tool into the process of the platform so that you can schedule and coordinate the ETL activities with other activities, so we will absorb some of the integration pain.

James Dixon

07-14-2005, 10:21 AM
Thanks James. I am definitely whistling a happy tune.

I am just trying to gauge where things are at. I have been "selling" this to management. I don't know if you saw any of my previous comments but this project is doing alot of what I was thinking of doing ... or at least wanting to do. I, like the person who posted about metadata, had identified many of the products you are incorporating. So this project and you all's background validates what I was thinking, doing, and presenting. Of course, now management wants to know when.

I definitely would like to help out/ contribute back. I have helped where I could on Mondrian and JPivot and probably will do more once the code is in a public repository.

07-14-2005, 12:06 PM
Our current roadmap is available on the site. Our first code is available this month. Reporting will be at an official 1.0 release in October, with Analysis in November

James Dixon
Chief Geek

07-14-2005, 01:29 PM
Thanks. I've looked at that (multiple times). And what is available from sourceforge. And Googled a bunch. I look forward to July 31. :) I was hoping to be involved sooner but I'll just have to wait.

12-12-2005, 05:58 AM

It seems to be a general idea that reporting and analytical tools should be able to fix everything and the kitchen sink as far as data manipulation goes.

This view is somewhat pushed by the traditional BI vendors as it provides them instant vendor lock in.

I, (and many others in ETL land) in contrast strongly feel that any calculation or data manipulation worth mentioning should occur in the ETL process and the result should be stored in the data warehouse.

This practice reduces the risk of different users applying different calculations for the same report.
The general rule of thumb for me as a data warehouse designer is to push all complex lookups, etc. towards the ETL process.
NOT only because BI tools often can't handle complex lookups and transformations, but mainly because of consistency.

Therefor, I would like to convince you to include a powerfull ETL tool that allows users to quickly build and maintain a full data warehouse.
For me this means including slowly changing dimensions. Copying a couple of tables and hoping that the reporting engine will be able to cope just won't do.

Please feel free to consider including Kettle.
Kettle is an ETL tool that turned LGPL about 10 days ago.

The project page is at: http://kettle.javaforge.com
More info: http://www.kettle.be

Pentaho looks like a great tool!

Kind regards,

Matt Casters

12-12-2005, 06:21 AM

I agree with many of your points here.

Just as we do with Reporting engines, we will try to support multiple ETL tools so that different options are available.

If you could send us the Java code for how to initiate an ETL activity in Kettle it would help us a lot.

James Dixon

12-12-2005, 06:49 AM
Actually, Kettle is 100% meta-data driven.

You can create transformation by designing them using a GUI.
You can also program them. Look at this page for a simple example on how to program a transformation :


If you have any questions about this, just ask.

You say you would like to support any ETL engine and I applaud you for that. It certainly seems the best strategy.
However, is there anything that the ETL part can do to better integrate with the reporting engine/metadata?

I was thinking along the lines of a data model extractor based upon the transformations in the ETL. Kettle also has functionality to backtrack the origin of (calculated) fields in tables. However, is that kind of information something you can use? There is this question of interfacing, etc. Many questions, but I'm sure they can all be answered over time ;-)



12-15-2005, 11:39 PM

On the topic of metadata I would say that support for JMI and the standard MOF metamodels under the CWM (including the OLAP and data mining models) would be the way to start. Lots of people have already agreed on metadata models and APIs, why start from scratch?

If you are interested in writing a component for the Pentaho platform that executes an existing Kettle transformation then Kettle will be the premier (and only) ETL tool supported by Pentaho. The API is quite easy if you subclass org.pentaho.component.ComponentBase and there are lots of sample components. Pentaho will instantly provide web services, workflow integration, scheduling, auditing etc for Kettle.


12-16-2005, 12:21 AM
> why start from scratch?

CWM in particular and XML in general are interfaces. For me, that's as far as it goes. My impression when I started with Kettle was that CWM would hinder the development because of the sheer complexity of that standard. It certainly would have been a lot less pleasant to develop with.
So that would be a reason to start from scratch. ;-)

But again, I see XML as an interface that can operate independently from "native" meta-data and in the future it would probably not be all too hard to make a converter.

>API is quite easy if you subclass org.pentaho.component.ComponentBase and there are lots of sample

Great! I'll take a look at ComponentBase and start linking.
I should have a little bit of time available over the weekend ;-)


02-16-2006, 02:53 AM
MattCasters wrote:

Great! I'll take a look at ComponentBase and start linking.
I should have a little bit of time available over the weekend ;-)

Have you made any progress on this?

02-16-2006, 03:48 AM
A KettleComponent has been written by James Dixon and myself. (mostly James ;-))
It's proof of concept but allows a transformation from a repository to be executed. More work lies ahead though...

With a little patience, tender loving and care we'll eventually get there ...



02-16-2006, 04:07 AM
I would very much like to have a look at it, I was about to write a component myself.

Are you using the kettle.pan class to run the transformation?

02-16-2006, 04:34 AM
No, the class that allows a transformation metadata (TransMeta) to be executed is simply called Trans. Methods are available like execute(), etc.
Perhaps James can post the source, I don't have the final version with me, sorry about that.