i’ve tried the mailing list (http://list.waikato.ac.nz/pipermail/...er/065332.html), i’ve tried stackoverflow (http://stackoverflow.com/questions/3...evaluation)… here’s a shoutout via this pentaho forum:

Suppose `X` is a raw, labeled (ie, with training labels) data set, and `Process(X)` returns a set of `Y` instances that have been encoded with attributes and converted into a weka-friendly file like Y.arff.

Also suppose `Process()` has some 'leakage': some instances `Leak = X-Y` can't be encoded consistently, and need to get a default classification `FOO`. The training labels are also known for the Leak set.

My question is how I can best introduce instances from Leak into the weka evaluation stream AFTER some classifier has been applied to the subset `Y`, folding the `Leak` instances in with their default classification label, before performing evaulation across the full set `X`?

In code:

DataSource LeakSrc = new DataSource("leak.arff");
Instances Leak = LeakSrc.getDataSet();  
DataSource Ysrc = new DataSource("Y.arff");
Instances Y = Ysrc.getDataSet();  

// i now find the note in https://weka.wikispaces.com/Use+WEKA+in+your+Java+code  
// The classifier (in our example tree) should not be trained when handed over to the crossValidateModel method.
// classfr.buildClassifer(Y)

// YunionLeak = ??
eval.crossValidateModel(classfr, YunionLeak);
Maybe this is a specific example of folding together results from multiple classifiers?