Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: What happens with absolute errors?

  1. #1
    Join Date
    Jun 2013

    Question What happens with absolute errors?

    I’ve been happily using Weka for the last 6 years and now I’ve a question which I’m not sure how to deal with…
    I’ve done simulations comparing the results of a continuous variable imputation using traditional statistical methods (PMM multiple imputation using SAS software) and data mining procedures (M5P, MLP Regressor, Least median squared regression and RBF regressor). Using the typical WEKA evaluation measures (correlation real/imputed, mean absolute error, root mean squared error & root relative squared error), Weka procedures provide results around a 20% better (I’ve used microdata files from official statistical surveys for 5-6 different European countries).
    Since one important aim of these surveys is to estimate averages for the populations, I’ve simulated 50% of missing data for each survey and computed the average for the semi-imputed files as estimations for the average when missing data occur. What I’ve observed is that the averages with Weka imputations are further from the original average than the obtained using MI-PMM imputation. Thus, ALTHOUGH THE INDIVIDUAL ONE-BY-ONE ERRORS ARE SMALLER, THE ERROR ON THE AVERAGE VALUE IS BIGGER.
    Studying more thoroughly the issue, what happens is the average absolute error (the average of the errors >0 and <0, without absolute figures) are more frequently < 0 (that is, the imputed values are smaller than the corresponding originals, more frequently than the contrary).
    I’ve done simulations on different randomized splits training/test of the original files (10 x 2 for each country), using unweighted and weighted data, and it is always the same, while, for the same simulated splits, the MI-PMM provides smaller and < 0 and > 0 errors.
    I’ve thought it could be a question of precision, being the case that I used the logarithms of the variables to be imputed (maybe Weka truncated the decimals positions at some step) but I’ve just checked that this is not the case. So, MAY BE ALL THESE PROCEDURES PRODUCES SYSTEMATICALLY SMALLER IMPUTATIONS THAN THE ORIGINAL VALUES, ALTHOUGH IN ABSOLUTE TERMS THE RESULTS ARE BETTER? Something must be wrong, because this is not what is expected (laws of big numbers)…
    Thank you. Best regards,

  2. #2
    Join Date
    Aug 2006


    I have to admit that I don't really understand what you are doing. Are you using Weka regression schemes to predict numeric target values as a method of missing value imputation?

    Mean absolute error (and all Weka evaluation metrics) is computed by weka.classifiers.Evaluation. The code is very simple:

       * Update the numeric accuracy measures. For numeric classes, the accuracy is
       * between the actual and predicted class values. For nominal classes, the
       * accuracy is between the actual and predicted class probabilities.
       * @param predicted the predicted values
       * @param actual the actual value
       * @param weight the weight associated with this prediction
      protected void updateNumericScores(double[] predicted, double[] actual,
          double weight) {
        double diff;
        double sumErr = 0, sumAbsErr = 0, sumSqrErr = 0;
        double sumPriorAbsErr = 0, sumPriorSqrErr = 0;
        for (int i = 0; i < m_NumClasses; i++) {
          diff = predicted[i] - actual[i];
          sumErr += diff;
          sumAbsErr += Math.abs(diff);
          sumSqrErr += diff * diff;
          diff = (m_ClassPriors[i] / m_ClassPriorsSum) - actual[i];
          sumPriorAbsErr += Math.abs(diff);
          sumPriorSqrErr += diff * diff;
        m_SumErr += weight * sumErr / m_NumClasses;
        m_SumAbsErr += weight * sumAbsErr / m_NumClasses;
        m_SumSqrErr += weight * sumSqrErr / m_NumClasses;
        m_SumPriorAbsErr += weight * sumPriorAbsErr / m_NumClasses;
        m_SumPriorSqrErr += weight * sumPriorSqrErr / m_NumClasses;
       * Returns the mean absolute error. Refers to the error of the predicted
       * values for numeric classes, and the error of the predicted probability
       * distribution for nominal classes.
       * @return the mean absolute error
      public final double meanAbsoluteError() {
        return m_SumAbsErr / (m_WithClass - m_Unclassified);
    Note that the averages are only computed over those test instances that have an actual class value.


  3. #3
    Join Date
    Jun 2013


    i dont consider these codes as simple as they are very complicated to understand and apply....

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.