US and Worldwide: +1 (866) 660-7555
Results 1 to 2 of 2

Thread: Find statistically significant words in text?

  1. #1
    Join Date
    Jul 2012
    Posts
    17

    Question Find statistically significant words in text?

    I have a data base of text (product descriptions) associated with various quantities (price, popularity, rating, etc).

    What is the easiest way to mine this text data to look for words that are statistically significantly correlated with any of the quantities?

    I think I should use a "regression" function of Weka. I'm a total newbie so any and all tips are appreciated.

    Thanks

  2. #2
    Join Date
    Aug 2006
    Posts
    1,068

    Default

    Yes, you'll want to use a regression scheme. I'd suggest using the StringToWordVector filter to build a dictionary and vectorize your text files. This filter also has options that allow you to control the size of the dictionary (e.g. limit it to the x most frequently occurring words). As the number of words is likely to be quite large, it would probably best to apply support vector machines for regression (SMOreg). SVMs work well in high dimensional spaces. Using the default options will produce a linear model with weights for each attribute (rather than weights for each support vector).

    Cheers,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •