Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Weka text classification

  1. #1
    Join Date
    Jun 2016
    Posts
    2

    Default Weka text classification

    Hi I am new to weka . I have started to use TextClassifier available on weka api for classification on movie reviews .I am getting a nullpointerexception .(dont know the reason ) . PLease help
    error is as following
    DATA SET:

    java.lang.IllegalArgumentException: Attribute names are not unique! Causes: 'class'
    at weka.core.Instances.<init>(Instances.java:265)
    at weka.core.DictionaryBuilder.getVectorizedFormat(DictionaryBuilder.java:1250)
    at weka.core.DictionaryBuilder.finalizeDictionary(DictionaryBuilder.java:1789)
    at weka.filters.unsupervised.attribute.StringToWordVector.batchFinished(StringToWordVector.java:689)
    at weka.filters.Filter.useFilter(Filter.java:694)
    at TextClassifier.filterText(TextClassifier.java:451)
    at TextClassifier.classify(TextClassifier.java:209)
    at TextClassifier.main(TextClassifier.java:145)
    java.lang.NullPointerException
    at TextClassifier.classify(TextClassifier.java:213)
    at TextClassifier.main(TextClassifier.java:145)
    DATA SET:
    @relation 'data set'

    @attribute text string
    @attribute class {'?',pos,neg}
    ....

    Exception (sorry!):
    java.lang.NullPointerException
    NEW CASES:

    [Ljava.lang.String;@40ef3420
    java.lang.IllegalArgumentException: Attribute names are not unique! Causes: 'class'
    at weka.core.Instances.<init>(Instances.java:265)
    at weka.core.DictionaryBuilder.getVectorizedFormat(DictionaryBuilder.java:1250)
    at weka.core.DictionaryBuilder.finalizeDictionary(DictionaryBuilder.java:1789)
    at weka.filters.unsupervised.attribute.StringToWordVector.batchFinished(StringToWordVector.java:689)
    at weka.filters.Filter.useFilter(Filter.java:694)
    at TextClassifier.filterText(TextClassifier.java:451)
    at TextClassifier.classifyNewCases(TextClassifier.java:313)
    at TextClassifier.main(TextClassifier.java:149)
    java.lang.NullPointerException
    at TextClassifier.checkCases(TextClassifier.java:374)
    at TextClassifier.classifyNewCases(TextClassifier.java:321)
    at TextClassifier.main(TextClassifier.java:149)

    Warning!
    The text to classify didn't contain a single
    word from the modelled words. This makes it hard for the classifier to
    do something usefull.
    The result may be weird.


    CHECKING ALL THE INSTANCES:
    Class values (in order): '?' 'pos' 'neg'

    Exception (sorry!):
    java.lang.NullPointerException

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Rename your class attribute to something other than "class". The problem is that the word "class" occurs in the text of your documents, so when this is converted to a feature the name clashes with your actual class attribute name. Use something like @@class@@.

    Cheers,
    Mark.

  3. #3
    Join Date
    Jun 2016
    Posts
    2

    Default

    Thank you very much Mark . This may inconvenience you but anyways i'm shooting : I tried to build my own arff file . Thus , I acquired IMDb review small dataset , found the total dictionary of 2000 files, then made instances of number of attributes = no.of words of vocabulary + one class attribute . Then I converted into arff file . Now I an trying to access that dataset using ARFF reader I am getting error as
    Exception in thread "main" java.io.IOException: premature end of line, read Token[EOL], line 9
    at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:354)
    at weka.core.converters.ArffLoader$ArffReader.getNextToken(ArffLoader.java:452)
    at weka.core.converters.ArffLoader$ArffReader.parseAttribute(ArffLoader.java:846)
    at weka.core.converters.ArffLoader$ArffReader.readHeader(ArffLoader.java:815)
    at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:162)
    at TextCategorizationTest.main(TextCategorizationTest.java:30)

  4. #4
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    This sounds like line 9 does not have enough attribute values - the parser hit an end-of-line when it was expecting to read more data.

    Cheers,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.