Hey all,

I am currently working on my bachelorthesis and examining a bayesian network-approach on a dataset of 82 instances.
While doing research for methods, I choose the TAN-method [FGG97], because it is shown to be significantly better as the naive bayes method.


But I got confused:
Weka has TAN implemented as local and global search method, which differ in the evaluation of the found network.

What I do not understand, is that the TAN method (regarding [FGG97]) uses the (conditional) mutual information-measure,
but I am able to specify other scores for the networks, like Bayes, AIC and so on.


My questions are:
0) At which point do the scores (Bayes, ..) apply and are they a replacement for the (conditional) mutual information-"score"?
1) Are these search methods incremental in a sense, that the TAN-algorithm is applied multiple times and if so, which parameters are varied?
2) TAN itself returns only one spanning tree that maximizes the conditional mutual information. Why is here scoring done as only one tree is returned (related to 1))?

In [FGG97] the method of constructing a TAN is defined as follows:

1. Compute IP ˆD(Ai; Aj | C) between each pair of attributes, i != j.
2. Build a complete undirected graph in which the vertices are the attributes A1, . . . , An.
Annotate the weight of an edge connecting Ai to Aj by IP ˆD(Ai; Aj | C).
3. Build a maximum weighted spanning tree.
4. Transform the resulting undirected tree to a directed one by choosing a root variable
and setting the direction of all edges to be outward from it.
5. Construct a TAN model by adding a vertex labeled by C and adding an arc from C to
each Ai.


I guess that I do not understand the whole search method-methodoly, but I am not able to find literature that help me .


Thank you so much for your help.


Best,
Markus


References:

[FGG97] N. Friedman, D. Geiger, M. Goldszmidt (1997). Bayesian network
classifiers. Machine Learning. 29(2-3):131-163.