PDA

View Full Version : Text Classification



Houman Babai
02-14-2007, 08:06 PM
Hello,

I'm trying to use Weka to do text classification. I have no experience in machine learning, so please bare with me.

I'm trying to classify messages that are one sentence long. I've created TF/IDF vector but I'm not sure how to solve my problem using the TF/IDF and Weka ;-)

Let's assume that the symbols 'A','B' & 'C' stand for vocbulary terms in my TF/IDF. Further more let's assume that I have 2 classifications {1,2}, from the examples I've seen I could create the following @data section for my arff file:

@data
%A_freq,B_freq,C_freq,classification
.20,.30,0,1
.40,0,0,2

Now when a new sentence comes in I can see if it has the term A, B or C in it, but how does the TF/IDF help me in this situation?

Your input would be greatly appreciated.

Thanks
Houman

Mark
03-01-2007, 05:11 PM
Hi,

This was answered by Peter on the Wekalist on the 16th of February. Basically, you just need to use the StringToWordVector filter on both your training data and each test instance.

Cheers,
Mark.

annmathew
04-14-2008, 03:14 AM
hi Houman,
Hope you are fine.I am too new to text classifcation.Can you let me know how can we proceed from beginning when have some text..Will you be able to help me..
regards
ann