Houman Babai
02-14-2007, 07:06 PM
Hello,
I'm trying to use Weka to do text classification. I have no experience in machine learning, so please bare with me.
I'm trying to classify messages that are one sentence long. I've created TF/IDF vector but I'm not sure how to solve my problem using the TF/IDF and Weka ;-)
Let's assume that the symbols 'A','B' & 'C' stand for vocbulary terms in my TF/IDF. Further more let's assume that I have 2 classifications {1,2}, from the examples I've seen I could create the following @data section for my arff file:
@data
%A_freq,B_freq,C_freq,classification
.20,.30,0,1
.40,0,0,2
Now when a new sentence comes in I can see if it has the term A, B or C in it, but how does the TF/IDF help me in this situation?
Your input would be greatly appreciated.
Thanks
Houman
I'm trying to use Weka to do text classification. I have no experience in machine learning, so please bare with me.
I'm trying to classify messages that are one sentence long. I've created TF/IDF vector but I'm not sure how to solve my problem using the TF/IDF and Weka ;-)
Let's assume that the symbols 'A','B' & 'C' stand for vocbulary terms in my TF/IDF. Further more let's assume that I have 2 classifications {1,2}, from the examples I've seen I could create the following @data section for my arff file:
@data
%A_freq,B_freq,C_freq,classification
.20,.30,0,1
.40,0,0,2
Now when a new sentence comes in I can see if it has the term A, B or C in it, but how does the TF/IDF help me in this situation?
Your input would be greatly appreciated.
Thanks
Houman