Title:

Explanation from neural networks

Neural networks have frequently been found to give accurate solutions to hard classification problems. However neural networks do not make explained classifications because the class boundaries are implicitly defined by the network weights, and these weights do not lend themselves to simple analysis. Explanation is desirable because it gives problem insight both to the designer and to the user of the classifier. Many methods have been suggested for explaining the classification given by a neural network, but they all suffer from one or more of the following disadvantages: a lack of equivalence between the network and the explanation; the absence of a probability framework required to express the uncertainty present in the data; a restriction to problems with binary or coarsely discretised features; reliance on axisaligned rules, which are intrinsically poor at describing the boundaries generated by neural networks. The structure of the solution presented in this thesis rests on the following steps: Train a standard neural network to estimate the class conditional probabilities. Bayes’ rule then defines the optimal class boundaries. Obtain an explicit representation of these class boundaries using a piecewise linearisation technique. Note that the class boundaries are otherwise only implicitly defined by the network weights. Obtain a safe but possibly partial description of this explicit representation using rules based upon the cityblock distance to a prototype pattern. The methods required to achieve the last two represent novel work which seeks to explain the answers given by a proven neural network solution to the classification problem.
