Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Requested array size exceeds VM limit with cross-validation

  1. #1
    Join Date
    Jul 2016
    Posts
    2

    Default Requested array size exceeds VM limit with cross-validation

    Hello,

    I'm facing an issue with Weka telling that an array become too big to be handled by the JVM:

    Code:
    Exception in thread "Thread-5" java.lang.OutOfMemoryError: Requested array size exceeds VM limit    java.util.Arrays.copyOf(Arrays.java:3332)
        java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
        java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
        java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
        java.lang.StringBuffer.append(StringBuffer.java:272)
        weka.classifiers.bayes.NaiveBayes.toString(NaiveBayes.java:717)
        weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1438)
    
    
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
        at java.lang.StringBuffer.append(StringBuffer.java:272)
        at weka.classifiers.bayes.NaiveBayes.toString(NaiveBayes.java:717)
    at weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1438)
    This error occurs with both Weka 3.8 and 3.9.1-SNAPSHOT (nightly build) while I'm doing a 10-folds cross-validation with a Naive Bayes algorithm via the GUI. Also, what I do not understand is that when I run via the command line:

    Code:
    java -cp weka.jar weka.classifiers.bayes.NaiveBayes -t training_set_scenario1.arff -x 10 -o
    It works well. Here the steps to reproduce it:

    1) java -jar weka.jar
    2) click on "Explorer"
    3) click on "Open file" and select the .arff file on the filesystem
    4) click on the tab "Classify"
    5) Choose the "Naive Bayes" algorithm
    6) Click on "Start"

    The .arff file is about 50MB only and I put it available for debugging needs. You can download it by running:

    Code:
    curl http://3cixty-alpha.eurecom.fr/training_set_scenario1.arff > training_set_scenario1.arff
    Do not hesitate to ask if you need any other information to debug.

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    You have hit a limit on the size of arrays (which are indexed by int) in the VM. This looks to be due to the textual output of the NaiveBayes model being very large. As you've found, suppressing the output of the model on the command line avoids this issue. The same can be done in the Classifier panel of the Explorer by un-checking the "Output model" checkbox under "More options...".

    Cheers,
    Mark.

  3. #3
    Join Date
    Jul 2016
    Posts
    2

    Default

    Problem solved! Thanks a lot :-)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.