US and Worldwide: +1 (866) 660-7555
Results 1 to 4 of 4

Thread: WEKA Functionality- XML, R, Web Content Mining??

  1. #1
    Join Date
    Aug 2012
    Posts
    3

    Default WEKA Functionality- XML, R, Web Content Mining??

    Hi


    I am new to WEKA and I am hoping if someone can help me out with some questions I have?

    1. Can WEKA handle/parse XML content?
    2. R has a plugin for WEKA. But does WEKA have an extension for R?
    3. I see WEKA can handle text processing. But does WEKA support web content mining?

    thanks
    Bonjovi2012

  2. #2
    Join Date
    Aug 2012
    Posts
    3

    Default

    I see that there is an RPlugin extension for WEKA that allows us to run R scripts in WEKA. So please now ignore bullet point 2.

    cheers

  3. #3
    Join Date
    Aug 2006
    Posts
    1,070

    Default

    Hi,

    There is no facility built in to WEKA for parsing XML as such. I'd suggest using PDI for preprocessing such documents. Weka has some text preprocessing facilities (basically conversion to the bag-of-words vector format) but there isn't any facility for dealing with HTML web pages - again these would need to be preprocessed to extract textual content (and perhaps link information and meta data etc) before conversion to the bag-of-words format. Once this is done then standard propositional classification/clustering can be applied.

    Cheers,
    Mark.

  4. #4
    Join Date
    Aug 2012
    Posts
    3

    Default

    Thanks Mark

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •