Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Extremely Urgent!! Restart job

  1. #1
    Join Date
    Nov 2015
    Posts
    2

    Exclamation Extremely Urgent!! Restart job

    Hi,

    I'm running a filter (attribute selection) on weka. It has been running for a month (it is a really big file).
    However my server is going to be restarted in a few hours.
    Is it possible to save what has been done until now and to restart it after they have rebooted the machine?
    I'm running on the GUI, and i did not manage to find any temporary file created by weka.

    Please help!!

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi. I'm afraid you're out of luck. There is no way to resume most learning processes in Weka (the only exception are incremental classifiers that have been periodically saved to disk). I'm curious to know which feature selection process you are running. Most are fairly simple and are quite likely to give reasonably similar results on a subset of the data compared to the full dataset. One exception to this might be the Wrapper approach (depending on the complexity of the base learner).

    Cheers,
    Mark.

  3. #3
    Join Date
    Nov 2015
    Posts
    2

    Default

    Hi. Thank you anyway for your answer.
    I was running a datasaet of 1.5 million attributes and 100 instances for each. In order to reduce the size of the set, and get rid of attributes that are not correlated to the class, i was using the CfsSubsetEval evaluator with the option BestFirst in the Attribute Selection filter. I'm not so sure if it is the best thing for my dataset.
    I think i'm going to start again the calculation but maybe reduce a bit the number of attributes...trying to elimante those i think are not so important. Could you give me an advice on that?
    And also i was trying to get at the end the correlation matrix, but i was not able to find it in weka. The program displays it at the end? If not i think i can do it with Matlab..
    Thank you so much,
    Alicia

  4. #4
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    I would suggest trying a ranking scheme on that dataset first and choosing the top n features :-) CFS has runtime that is quadratic in the number of features :-) Computing a correlation matrix is quadratic, plus BestFirst/GreedyStepwise searches are quadratic in the worst case. Only the simple ranking schemes (info gain, gain ratio, chi squared etc.) have runtime that is linear in the number of attributes. What you could do is use a ranking scheme to whittle down the number of features to something reasonable (e.g. in the 10s or perhaps 100s of thousands) and then run something like CFS on this reduced set.

    Unfortunately, CFS does not output the actual correlation matrix.

    Cheers,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.