After some preliminary surfing for issues regarding the Weka project, it seems that Weka is not suited for large data sets.This seems a major drawback in comparison to its commercial competitors like Oracle,SPSS...

Why has the Pentaho community opted for Weka if this is the case?Does the pentaho architecture provide a work around for this issue?