I was looking at Weka and it is GPL. How does that affect how it will or can be used in Pentaho?

We will integrate with Weka by providing input files in the format it expects, and we will launch Weka as a standalone application. Using Weka this way does not violate the GPL license and does not mean that the GPL license must be applied to the Pentaho platform.

The GPL license allows us to distribute and 'link' to Weka without the GPL license applying to the Pentaho Platform. We cannot 'embed' Weka as a library under the GPL license.

It basically comes down to how things are packaged.

The way the licenses work, we're able to incorporate other LGPL software into our installation by way of the fact that we're LGPL also, whereas with GPL software we need to point you over their project site to download seperately. That said, we can still include all the integration code, etc from our project contributors into our installation to make deployment as easy as possible.

I figured that is what would need to be done. I wasn't sure how standalone it was.

Are you going to provide some custom look and feel to it so it will look like the rest of the toolset? (Yes, I understand this will need to be provided back to the Weka community)

Also, how good is Weka? Why I am asking - I've posted elsewhere that I was looking to do the same sort of thing - pull a bunch of OSS projects together to make a "unified" BI project. I had looked at Weka but the guy I am working with said the reason people use, for instance, SAS is because their formulas are proven. I think we were looking at using ABLE instead since it came from IBM. But since you all have experience in this area and are going to "use" Weka, it is worth a second look.

I have been working with WEKA as a stand-alone lab of machine learning tools and it works great. I have compared results from tools like SAS and SPSS and WEKA has performed quite well. The only difference I noted is processing speed. Generally, the other tools run faster on huge high dimensional datasets.

Anyway, if someone does not trust in the results, the code is public and taken from reference papers, so it's not so hard to check.

I vote for WEKA as a standard datamining tool for Pentaho.