View Full Version : Monti Carlo simulation

greg.soulsby

05-16-2009, 05:02 AM

I want to take the results from Weka models and want then into optimisation and risk models in @Risk monti carlo simulations. I cant see why that step could not be bought back into my Weka KnowledgeFlow. Where can I start to look for a solution to that approach?

I'm not sure I understand what you want to do here. Do you want to optimize the parameters of a learning algorithm in order to maximize an evaluation metric on your data? If so, then the Experimenter environment and/or the GridSearch meta classifier is probably what you'd want to use.

Cheers,

Mark.

greg.soulsby

05-17-2009, 06:52 AM

Mark, that is our first step, which is great.

For examples sake, we want to do that twice. Once to score customers with a propensity to buy, and then, for customers who buy, the likely margin.

Combining the two models gives us a probability profile for the expected margin for a "marketing campaign".

In the @Risk MontiCarlo tool we run 1000 virtual customers through both models, using the random number generator to give each customer a behaviour based on the probabilities from the 2 models.

Then we have an If statement If([model_1_result]="customer_bought" then Expected_margin = [model_2_margin], else Expected_margin=0).

The profile of those 1000 expected_margins is our simulated probability of different total margins for the campaign.

Weka has the tools for the model build at the start, and something like the assessment tools for the profiling of the result. What I am looking is "the random number generator / pump it into a model" bit for the MontiCarlo simulation in the middle. Thoughts?

Greg

Hi Greg,

I have to admit that I don't know much about risk assessment :-) An couple of questions/points occur to me though.

1) Why do you use an if statement in the computation of expected margin? Shouldn't it just be predicted_prob_of_buying * predicted_margin? Otherwise you are assuming that the default threshold (0.5) on the probability of buying is suitable for your application. In most scenarios, class distributions are skewed and improvement can be had by selecting a threshold other than 0.5. In your scenario, you could just rank all customers by expected margin and explore various thresholds on expected margin (with respect to the costs/benefits involved).

2) Weka doesn't have a facility to generate random data based on arbitrary distributions for each input variable (this is what the monte carlo risk assessment process does if I understand correctly). [This is where I probably show my ignorance :-)] Depending on how many input variables you have, and if they are treated independently of one another given the class label, then I can imagine that you quickly run into issues with the "curse of dimensionality" and would need a huge number of trials to get even semi-reasonable coverage of the input space. With only a thousand trials, it could be the case that you populate areas of the space that are less likely to occur in practice, which in turn would lead to a pretty non-representative probability distribution over the outcomes. If this is indeed what you want to do, then Weka has an implementation of the NaiveBayes classifier, which models p(att|class) for all attributes individually. It would be possible to write some code to produce a data generator based on this. It could also be extended to Bayes nets if you know the attribute dependencies involved (although handling numeric attributes is tricky).

Cheers,

Mark.

meow44

06-04-2009, 11:59 AM

simulation rachat de credit (http://simulationrachatdecredit.org)

this is interesting. thanks for sharing;)