Databases
for learning of Bayesian networks
This page contains links to varioss databases of cases in Hugin format. Each database consists of cases sampled from a Bayesian network presented in the book. The tasks below are to use a BN-learning system of your own choice and to investigate the result. In general you shall not expect to retrieve exactly the original network. For evaluating the result, you may first compare the d-separation properties of the networks. If the networks different, but the d-sepatation properties are identical, you shall not hope for anything better. If the learned network has d-separation properties not shared with the initial network, this may be an outset for looking closer into why your system goes wrong.
Another way to evaluate the differences can be to run the cases in both networks and compare the scores.
You may download tools for BN-learning at the following sites:
http://b-course.cs.helsinki.fi, http://bayesware.com, www.hugin.com, http://research.microsoft.com/~dmax/WinMine/Tool.doc.htm
On the web page Datamine you find other links to machine learning tools.
Complete cases
The links below give access to databases consisting of 10 000 complete cases (for each case, the state of each variable is known).
Angina | Pregnancy | Poker (Opponent's hand) | Poker (Best hand) |
Missing values
The links below give access to databases containing 10 000 cases, where some values are missing. The values are missing completely at random. A suffix "1" indicates that the probability for missing value is 0.1, and a suffix "3" indicates the probaility of missing value 0.3.
Hidden variable
This database contains 10 000 cases sampled from Pregnancy with the variable Ho never observed. The real learning task is to detect a hidden variable. None of the tools above are able to detect hidden variables, nor do they have tools for loooking for indications of hidden variables.
Structural constraint
This database contains 100.000 complete cases sampled from a three time slice model of Infected milk (§2.2.1). The model used is the one in Figure 2.12. The links for the variables Corrcti are preknown. So are also their potentials. The task is to investigate whether your BN-learning system has a facility to clamp structure as well as potentials before learning. Note: It sems that only Hugin has the option of fixing parts of the structure. Even for Hugin it is not straightforward. You have to use the API. The Microsoft tool allows you to specify variables as without parents or as without children