Hitachi Vantara Pentaho Community Forums
Results 1 to 13 of 13

Thread: Weka Java API - Linear Regression ...

  1. #1
    Join Date
    Sep 2013
    Posts
    15

    Question Weka Java API - Linear Regression ...

    Hi there,

    I have been looking at the Java API for Weka, and have been having great difficulty trying to find out how
    to pass a CSV file to the regression method. Its seems that there is a method to convert the CSV file to a specific file format called ARFF, which is another text file.

    My question is how should the fields of data be arranged, that being many Independent variables for case of Multiple Linear Regression and the one Dependent variable.

    Should a row look like this:
    x1,x2,x3,x4,x5,Y
    Or the reverse
    Y,x1,x2,x3,x4,x5

    I am asking this because if the only way to get data into the Regression is via this ARFF file structure, then I need to know how to structure the CSV data for it to be transformed properly to ARFF.


    This is what I have discovered so far.

    BUT, is there a way, a method that takes in like a 2d-array structure as an INPUT to the Regression method of Weka in its java code.


    Also once this data is passed in, which methods returns the array of coefficients.

    Hope someone can help.

    Regards,

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Weka has a package of "converters" for reading various data formats into its internal Instances data structure. Use the weka.core.converters.CSVLoader to read your CSV file and call the getDataset() method to get an Instances object. You can then call setClassIndex() on the Instances object in order to designate which column/attribute in the data is the dependent one.

    Cheers,
    Mark.

  3. #3
    Join Date
    Sep 2013
    Posts
    15

    Default

    Quote Originally Posted by Mark View Post
    Weka has a package of "converters" for reading various data formats into its internal Instances data structure. Use the weka.core.converters.CSVLoader to read your CSV file and call the getDataset() method to get an Instances object. You can then call setClassIndex() on the Instances object in order to designate which column/attribute in the data is the dependent one.

    Cheers,
    Mark.
    OK, so I was getting confused with the names of structures in the Weka API, so the Instances data structure is the structure that holds the data.
    SO I guess the data in the CSV should be structured any way as I specified earlier, and then specify by column index(number value), I guess the numbering of columns would be Zero index based.

    SO if the Instances object represents the data, I guess that would mean if one hard coded data into an array or arrayList then one can pass this into the Instances data structure as well.

    Let me know if I am right about this.

    Thanks so much.

  4. #4
    Join Date
    Sep 2013
    Posts
    15

    Default

    Hi there,
    I ran into some trouble with my code on this. The error seems to indicate that it could not load the csv file.
    I am not sure if it is a path issue of JavaIO stuff, i put the csv file right in where my java main class file is, so i don't think i have a path issue to the csv. See code below:


    Error Message::
    SEVERE: null
    java.io.IOException: No source has been specified
    at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:839)
    at statsproj.StatsProj.main(StatsProj.java:42)


    ---------------------------
    public static void main(String[] args) {
    // TODO code application logic here

    String filename = "longley2a.csv";
    int classIndex = 0 ;
    Instances data = null;

    LinearRegression predictor = new LinearRegression();
    CSVLoader loader = new CSVLoader();
    try {
    loader.setFile(new File(filename));
    data = loader.getDataSet(); //problem here: No source has been specified
    data.setClassIndex(classIndex);

    } catch (IOException ex) {
    Logger.getLogger(StatsProj.class.getName()).log(Level.SEVERE, null, ex);
    }


    }

    }
    ---------------------------------


    Hope you can help me out again Mark.

    Sincerely,

    P

  5. #5
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    The following works for me in Weka 3.7's Groovy console:

    import weka.core.converters.*;
    import weka.core.Instances;

    CSVLoader loader = new CSVLoader();
    loader.setFile(new java.io.File("/Users/mhall/Documents/Pentaho/demo/iris.csv"));

    Instances insts = loader.getDataSet();

    System.out.println(insts.toString());

    Are you running your program in the same directory as the "longley2a.csv" file? Is this CSV file loadable in Weka's Explorer UI, or do you get the same exception. Which version of Weka are you using?

    Cheers,
    Mark.

  6. #6
    Join Date
    Sep 2013
    Posts
    15

    Default

    Hi,
    I am using this jar file: weka-dev-3.7.12.jar

    Also the csv file is in the same folder as my java source file.
    I have not tried to use it in Weka the GUI based program.

    By the way is there a way for me to upload the csv file, its very small, 1kb.

    P
    Last edited by PabloGo; 04-04-2016 at 10:17 AM.

  7. #7
    Join Date
    Sep 2013
    Posts
    15

    Default

    Hi,
    I have not used the Weka GUI program before, but I just did.
    The file did import, under the first tab Preprocess, it shows under attributes a tableview, gets the first row of my csv, but my csv does not contain a Header row, just data. So under attributes it shows the first row of my data.
    I thought the program would somewhere show the rest of my data in a tableview. I have a total of 16 rows, and the interface says 15 instances(rows I guess) because minus 1 for header row I guess.
    It says status OK.
    So it seems all is fine with my csv file.
    Unless in my java program it failed to load due to not having a header row possibly, because it could be looking for String type for headers when my first row is all numeric.

    Like to get your input on the possible issues.

    P

  8. #8
    Join Date
    Sep 2013
    Posts
    15

    Default

    Hi,
    The Weka Explorer version I used is:weka-3-7-13.
    which I guess means
    that the jar file it is using is:3.7.13.

    SO could it be that the version 3.7.12 may have some issues.

    Regards.

  9. #9
    Join Date
    Apr 2016
    Posts
    4

    Default

    Is weka instance work with own data model? I already fetch the data from database using JPA(java). Now I want to use weka timeseries forecasting algo to my data. I used http://wiki.pentaho.com/display/DATA...ting+with+Weka this example. but i have problem with path data because i create my own data model. I dont have path. the name of my data model is ODCalls.java
    Last edited by prakhar; 04-08-2016 at 09:22 AM.

  10. #10
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    You will have to convert your data into Weka's Instances format, as this is what the time series forecasting environment works with.

    Cheers,
    Mark.

  11. #11
    Join Date
    Apr 2016
    Posts
    4

    Default

    How can i convert my data into Weka's Instances format? I have data in my model(name ODCall.java).
    How can Weka's Instances work without file path. If you provide me code it will be great help for me.

    Regards,
    Prakhar Shrivastava

  12. #12
    Join Date
    Sep 2013
    Posts
    15

    Default

    Quote Originally Posted by Mark View Post
    You will have to convert your data into Weka's Instances format, as this is what the time series forecasting environment works with.

    Cheers,
    Mark.

    Hi Mark,

    It seems this other person used my thread to ask a question. I was surprised by that.

    BUT, I tried your code with my file, and it does not work.
    I get the same error message.

    I tried the Weka jar file from the application itself, which i believe is version weka-3.7.13.jar.
    I used the Weka Explorer GUI app and I was able to import my txt file.

    In my first attempt I used the weka-dev-3.7.12.jar file.

    Hope you can help.

  13. #13
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    I'm not too sure what to suggest. Are you sure that the path you are providing exists? If you construct a File object with your path and call exists() on it does it return true? There is no magic in the Explorer, it uses the same classes to load data.

    Cheers,
    Mark.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.