A recurrent neural network, modified to handle highly incomplete training data is described. Unsupervised pattern recognition is demonstrated in the WHO database of adverse drug reactions. Comparison is made to a well established method, AutoClass, and the performances of both methods is investigated on simulated data. The neural network method performs comparably to AutoClass on simulated data, and better than AutoClass on real world data. With its better scaling properties, the neural network is a promising tool for unsupervised pattern recognition in huge databases of incomplete observations.
Left: Schematic description of a simple recurrent BCPNN. Middle: 12 training samples at completeness level 50% and noise level 0%. The top four samples are taken from the diamond prototoyp, the middle four from the rectangle prototype and the bottom four are pure noise. The underlying prototypes are barely distinguishable. Right: Sample output from 50% completeness and 50% noise including 4000 pure noise samples from BCPNN as well AutoClass, the latter thresholded.
Last modified: Fri Oct 31 09:59:30 CET 2005