View Full Version : Data Mining

11-30-2006, 01:11 AM
Perhaps you caught the news, perhaps not, but Pentaho just got a bit larger again by the acquisition of the Weka project (http://www.pentaho.org/news/releases/20060919_pentaho_acquires_weka.php).
I’m really excited about this because it means we can finally crank out a couple of new steps for Kettle without having to release the whole Kettle project under the GPL license. We could create new Weka plugins under GPL and then offer them to customers under a commercial license as well.
Some time ago I played around with Weka a bit and found it extremely hard to read data into the different available engines. The plan I have for the Kettle-Weka integration is to build a couple of steps that provide you with the best of both worlds: easy drag&drop data integration and state-of-the-art data mining modules. If it wasn’t for the very big workload I’m under, I would get started on this right away. Unfortunately though, the new meta-data architecture for Pentaho (+GUI) is taking it’s fair amount of time to develop.
Another option to aim for is the creation/inclusion of a data profiler for Pentaho Data Integration to do analyses of source data.
All in all, these are very exciting times for Pentaho. It’s an honour to be able to take part in it.

More... (http://www.ibridge.be/?p=19)