Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Predictions on unlabeled ("scoring" / "test") data in Java Weka

  1. #1
    Join Date
    Aug 2016
    Posts
    3

    Question Predictions on unlabeled ("scoring" / "test") data in Java Weka

    I'm using Java to train various tree model classifiers on data from .csv and SQL databases.

    I am able to train and cross-validate these models without issue. However, I am unable to generate predictions on the unlabeled new data, which I believe is because the new data naturally has 1 less column than the training data (i.e. the outcome variable is not in the new data).

    Is the only way to address this problem to add an empty column with the name of the outcome variable in the new data?

    I appreciate any confirmation / information on this; it seems to be a very unusual requirement to have the outcome variable in a prediction data set, but if that's what I need to do then so be it. I'll need to figure out how to add that column in Java, but please reply even if you're not able to help me do that.

    Thanks!

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Yes, you are correct - the outcome variable needs to be declared in the test data. However, all values can be set to missing (i.e. ?). Weka's Add filter is a convenient way to add such a column, as it sets all values to missing for the new attribute.

    Cheers,
    Mark.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.