Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: Equation from MultiLayer Perceptron

  1. #1
    Join Date
    Nov 2014
    Posts
    6

    Default Equation from MultiLayer Perceptron

    Hi everyone,
    I am new to this forum, and to Weka as well. For my work of thesis for my University, I've runned several regression operation using MultiLayerPerceptron on my database (which has 13 attributes plus my output).
    Now that I have checked that the MLP is ok for my need, I'd like to extract the equation correlating the input with the output, to make future prediction. In other words, I'd like to give a new set of 13 attributes and see what the output from weka is. Is this possible?
    How can this be made as I don't know how to write with Java?? What is the simplest and briefest way to get that equation?
    Thanks in advance to everyone.
    regards

  2. #2

    Default

    Getting an equation from the MLP would be difficult. Just make a separate test set and choose it under: Classify> Supplied test set. In your test set use ? in the output column.

    If you absolutely need an equation then try genetic programming or GMDH or even rules off decision trees.

  3. #3
    Join Date
    Nov 2014
    Posts
    6

    Default

    So I managed to make weka work with my new test set, infact, I wasn't setting the output column (the one with "?") as numeric but as string.
    Now I have a question, why do I get different results, for an identic instance when using two different test sets, which are different only for the number of rows that are present??
    Please is a very important question for me, thanks

    Adriano
    Last edited by adriano3; 12-08-2014 at 06:12 AM.

  4. #4
    Join Date
    Nov 2014
    Posts
    6

    Default

    I have split my original data set into two data set, where the first set (that I have called training set) is left unchanged, while for the second data (called test set) set I have changed all the output values with "?". But, when I run my trainig set using as supplied test set my test set, after using "InputMappedClassifier", I receive an error saying: "problem evaluating classifier: index: 134, size:1"
    What am I doing wrong? the two files have the same number of features, same label names, while the only thing that I have changed is the output which is set as "?"
    Thanks

    Adriano

  5. #5
    Join Date
    Nov 2014
    Posts
    6

    Default

    So I managed to make weka work with my new test set, infact, I wasn't setting the output column (the one with "?") as numeric but as string.
    Now I have a question, why do I get different results, for an identic instance when using two different test sets, which are different only for the number of rows that are present??
    Please is a very important question for me, thanks

    Adriano

  6. #6

    Default

    Quote Originally Posted by adriano3 View Post
    So I managed to make weka work with my new test set, infact, I wasn't setting the output column (the one with "?") as numeric but as string.
    Now I have a question, why do I get different results, for an identic instance when using two different test sets, which are different only for the number of rows that are present??
    Please is a very important question for me, thanks

    Adriano


    Is this time series data where the attibutes or output is shifted? If so, I suspect you have a future leak in your attribute(s). Let's say you have ten data points in your test set.If you remove point 10(the last instance), IN NO WAY should back instances(1-9) output change.

    It could also be that you are using a classifier with a different seed maybe.
    Last edited by Mike1961; 12-08-2014 at 10:42 PM.

  7. #7

    Default

    Quote Originally Posted by adriano3 View Post
    I have split my original data set into two data set, where the first set (that I have called training set) is left unchanged, while for the second data (called test set) set I have changed all the output values with "?". But, when I run my trainig set using as supplied test set my test set, after using "InputMappedClassifier", I receive an error saying: "problem evaluating classifier: index: 134, size:1"
    What am I doing wrong? the two files have the same number of features, same label names, while the only thing that I have changed is the output which is set as "?"
    Thanks

    Adriano

    What version of Weka are you using? I remember early versions to be very finicky about train/test sets. In version 3-7-11 it never does this to me anymore and you don't have to use InputMappedClassifier it does the mapping automatically.

  8. #8
    Join Date
    Nov 2014
    Posts
    6

    Default

    Quote Originally Posted by Mike1961 View Post
    Is this time series data where the attibutes or output is shifted? If so, I suspect you have a future leak in your attribute(s). Let's say you have ten data points in your test set.If you remove point 10(the last instance), IN NO WAY should back instances(1-9) output change.

    It could also be that you are using a classifier with a different seed maybe.
    Hello, Thanks for your answer; I don't understand what you mean by saying "the attibutes or output is shifted".
    Then, what do I have to do to avoid that back instances output change? I'm actually using the MLP classifier with the parameter seed set to 0, is it correct? should I try to change it??
    Thanks again

  9. #9

    Default

    Quote Originally Posted by adriano3 View Post
    Hello, Thanks for your answer; I don't understand what you mean by saying "the attibutes or output is shifted".
    Then, what do I have to do to avoid that back instances output change? I'm actually using the MLP classifier with the parameter seed set to 0, is it correct? should I try to change it??
    Thanks again
    Just make sure all your lags are moved down in the spreadsheet and output gets shifted up(which is the future).

    Another form of future leak is running certain transforms (the ones that need all information in that attribute) on the entire data set you have and then splitting that now transformed file into a train and a test set.

    Another form of future leak is by interpolation.Lets say you have 2 different time series frequencies(daily and weekly) and want to use them together. You want to convert the weekly data into daily so you unknowingly fill in the weekly missing days by interpolation. That's cheating because days 2,3,4 received extra information that wouldnt be present normally. Weekly point 1 is say at 500 and weekly point 2 is at 600. So by interpolation(which is cheating) you would get the weekly to daily series converted to: (500,525,550 575,600). You would have to have it convert to (500,500,500,500,600). That way no future information leaked forward from 500.

    To avoid this batch your transforms. You can batch your transforms in Weka(gotta love Weka design..3 bows down for the best thought out and easiest to use data mining software around) with the Simple CLI. The parameters for the train set and test set will be the same as they should be.

    Read up on using the CLI it is easy. Basic format: java "whatever transform you want" -b -c(sometimes) -i "trainingSet".arff -o "traininggSetWithMyNewTransform".arff -r "testSet".arff -s "testSetWithMyNewTransform".arff

    Right click the transform when in Explorer>Classify and save to clipboard. Ctrl-V(paste) to CLI to save a lot of typing. If you get an error use up key and correct your mistake to avoid re-typing everything.


    I dont think there is a "right" seed to use but you can vary the seed to come up with an average classifier performance. A lot of times one seed can look outstanding but it is basically an aberration. Use Classifier>meta>EnsembleSelection and vary the seed or different parameters and see what the results are.

    You can also use Experimenter and get lots of different cross-validations averaged together.

    I put in a future leak example.Train the data and test it with the two test sets (7 instances and the 8 instances.) All I did was remove the last instance(between the two test sets) and notice how the previous predictions change between the two test sets.

    This wasnt exactly a great example but I'm unsure how to delete the zip file.


    How to check for a future leak:

    Train data. Then test on complete test set. Note the predictions. Then remove the last instance of the complete test set(-1 instance now) and test on this.Are the previous predictions different? If so you have a future leak. Keep all seeds and parameters the same as training.
    Attached Files Attached Files
    Last edited by Mike1961; 12-23-2014 at 07:14 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.