Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Cross Validation in Weka

  1. #1
    Join Date
    Mar 2012
    Posts
    20

    Default Cross Validation in Weka

    Hi,
    I've always studied that cross validation is performed like this:

    "In k-fold cross-validation, the original sample is randomly partitioned into k subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds then can be averaged (or otherwise combined) to produce a single estimation"

    So k models are built and the final one is averaged. In Weka guide is wrote that each model is always built using ALL the data set. So how cross validation in Weka works? Is the model built from all data and the "cross-validation" means that k fold are created then each fold is evaluated on it and the final output results is simply the averaged result from folds?
    Last edited by Lazza87; 04-12-2012 at 12:24 PM.

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    No, that's not correct. When you perform a k-fold cross-validation from the Explorer or the command-line interface this is what happens by default:

    1) A model is constructed on all the data. The textual description of this model (i.e. tree, rules or whatever) is output to the output area along with the resubstitution evaluation metrics (i.e. performance on training data)
    2) Then the k-fold cross-validation is performed (as you describe) and the evaluation for this is printed to the output area.

    Cheers,
    Mark.

  3. #3
    Join Date
    Mar 2012
    Posts
    20

    Default

    So we can say that in Weka cross-validation is used only to validate the model built from all data rather than in general is also used to build the final model, isn't it?

  4. #4
    Join Date
    Apr 2012
    Posts
    7

    Default

    So is it normal that using different type of validation options (Training set, cross validation, Supplied test set or Percentage Split) the output model (in this case a tree) is exactly the same? Does the validation modify only values TP Rate?

  5. #5
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Yes, under all modes of evaluation the model output is always the one built on all the data (this is also what is saved when you save a trained model from the Explorer or command line). Cross-validation, percentage split etc. are just methods for getting an estimate of how well the model will perform on future data. If you plan to put a model into production for making predictions on future data then you will want to use all the available training data to produce that model.

    Cheers,
    Mark.

  6. #6

    Default

    If you plan to put a model into production for making predictions on future data then you will want to use all the available training data to produce that model.So we can say that in Weka cross-validation is used only to validate the model built from all data rather than in general is also used to build the final model, isn't it?

  7. #7

    Default

    When you perform a k-fold cross-validation from the Explorer or the command-line interface this is what happens by default:the output model (in this case a tree) is exactly the same? Does the validation modify only values TP Rate?

  8. #8
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    I'm not too sure what you are asking here. In the case of the cross-validation evaluation option one model is learned from all the data (simply for the purposes of printing out the textual description of the model). Following this, 10 models are learned during the actual cross-validation process - this is what is used to output the evaluation metrics (based on errors computed for each fold).

    Cheers,
    Mark.

  9. #9

    Default

    Hi Mark
    I am still not able to understand the working of Cross-validation. Can you please elaborate it with an example?

    Will really appreciate it!
    Regards
    Reena

  10. #10
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    See k-fold cross-validation in the Wikipedia article:

    https://en.wikipedia.org/wiki/Cross-...n_(statistics)

    Cheers,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.