Major publication of Shotaro Akaho

EM algorithm and mixture models(see also: multimodal)

S. Akaho: Mixture model for image understanding and the EM algorithm
We present a mixture model that can be applied to the recognition of multiple objects in an image plane. The model consists of any shape of submodules. Each submodule is a probability density function of data points with scale and shift parameters, and the modules are combined with weight probabilities. We present the EM (Expectation-Maximization) algorithm to estimate those parameters. We also modify the algorithm in the case that data points are restricted in an attention window.
S. Akaho: The EM Algorithm for multiple object recognition
We propose a mixture model that can be applied to the recognition of multiple objects in an image plane. The model consists of any shape of modules; Each module is a probability density function of data points with scale and shift parameters, and the modules are combined with weight probabilities. We present the EM (Expectation-Maximization) algorithm to estimate those parameters. We also modify the algorithm in the case that data points are restricted in an attention window.
S. Akaho and H.J. Kappen: Nonmonotonic generalization bias of Gaussian mixture models
Most theories of generalization performance of learning tell us that the generalization bias, which is defined as the difference between an training error and an generalization error, increases proportionally to the number of modifiable parameters in average. The present paper, however, reports the case that the generalization bias of a Gaussian mixture model does not increase even if the superficial effective number of parameters increases, where the number of elements in the Gaussian mixture is controlled by a continuous parameter.
S. Akaho and H.J. Kappen: Nonmonotonic generalization bias of Gaussian mixture models
Theories of learning and generalization hold that the generalization bias, defined as the difference between the training error and the generalization error, increases on average with the number of adaptive parameters. The present paper, however, shows that this general tendency is violated for a Gaussian mixture model. For temperatures just below the first symmetry breaking point, the effective number of adaptive parameters increases and the generalization bias decreases. We compute the dependence of the Neural Information Criterion (NIC) on temperature around the symmetry breaking. Our results are confirmed by numerical crossvalidation experiments.

ICA

S. Akaho, Y. Kiuchi, S. Umeyama: MICA: Multimodal Independent Component Analysis
We propose MICA (multimodal independent component analysis) that extends ICA (independent component analysis) to the case that there is a pair of information sources. MICA extracts statistically dependent pairs of features from the sources, where the components of feature vector extracted from each source are independent. Therefore, the cost function is constructed to maximize the degree of pairwise dependence as well as optimizing the cost function of ICA. We approximate the cost function by two dimensional Gram-Charlier expansion and propose a gradient descent algorithm derived by Amari's natural gradient. The relation between MICA and traditional CCA (canonical correlation analysis) is similar to the relation between ICA and PCA (principal component analysis).
S. Akaho: Conditionally Independent Component Extraction for Naive Bayes Inference
This paper extends the framework of independent component analysis (ICA) to supervised learning. The key idea is to find a conditionally independent representation of input variables for given output. The representation is useful for the naive Bayes learning which has been reported to perform as well as more sophisticated methods. The learning algorithm is derived in a similar criterion to ICA. Two dimensional entropy takes an important role, while one dimensional entropy does in ICA.

Regularization and kernel methods

S. Akaho: Regularization Learning of Neural Networks for Generalization
In this paper, we propose a learning method of neural networks based on the regularization method and analyze its generalization capability. In learning from examples, training samples are independently drawn from some unknown probability distribution. The goal of learning is minimizing the expected risk for future test samples, which are also drawn from the same distribution. The problem can be reduced to estimating the probability distribution with only samples, but it is generally ill-posed. In order to solve it stably, we use the regularization method. Regularization learning can be done in practice by increasing samples by adding appropriate amount of noise to the training samples. We estimate its generalization error, which is defined as a difference between the expected risk accomplished by the learning and the truly minimum expected risk. Assume $p$-dimensional density function is $s$-times differentiable for any variable. We show the mean square of the generalization error of regularization learning is given as $D n^{-2s/(2s+p)}$, where $n$ is the number of samples and $D$ is a constant dependent on the complexity of the neural network and the difficulty of the problem.

VC dimension, Network capacity

S. Akaho and S. Amari: On the Capacity of three-layer networks
S. Akaho: Optimal Decay Rate of Connection Weights in Covariance Learning
Associative memory of neural networks can not store items more than its memory capacity. When new items are given one after another, connection weights should be decayed so that the number of stored items does not exceed the memory capacity. This paper presents the optimal decay rate that maximizes the number of stored items, using the method of statistical dynamics.
S. Akaho: Capacity and Error Correction Ability of Sparsely Encoded Associative Memory with Forgetting Process
Associative memory model of neural networks can not store items more than its memory capacity. When new items are given one after another, its connection weights should be decayed so that the number of stored items does not exceed the memory capacity ({\em forgetting process}). This paper analyzes the sparsely encoded associative memory, and presents the optimal decay rate that maximizes the number of stored items. The maximal number of stored items is given by $O(n/a\log n)$ when the decay rate is $1-O(a\log n/n)$, where the network consists of $n$ neurons with activity $a$.
S. Akaho: VC dimension theory for a learning system with forgetting
In a changing environment, forgetting old samples is an effective method to improve the adaptability of learning systems. However, too fast forgetting causes a decrease of generalization performance. In this paper, we analyze the generalization performance of a learning system with a forgetting parameter. For a class of binary discriminant functions, it is proved that the generalization error is given by $O(\sqrt{h\alp})$ ($O(h\alp)$ in a certain case), where $h$ is the VC dimension of the class of functions and $1-\alp$ represents a forgetting rate. The result provides a criterion to determine the optimal forgetting rate.

Pattern Recognition, Feature extraction

S. Akaho: Translation, Scale and Rotation Invariant Features Based on High-Order Autocorrelations
Local high-order autocorrelation features proposed by Otsu have been successfully applied to face recognition and many other pattern recognition problems. These features are invariant under translation, but not invariant under scale and rotation. We construct scale and rotation invariant features from the local high-order autocorrelation features.
S. Akaho: Curve fitting that minimizes the mean square of perpendicular distances from sample points
This paper presents a new method of curve-fitting to a set of noisy samples. In the case of fitting a curved line (or a curved surface) to given sample points, it seems natural the curve is decided so as to minimize the mean square of perpendicular distances from the sample points. However it is difficult to get the optimal curve in the sense of this criterion. In this paper, the perpendicular distance is approximated by local linear approximation of function, and the algorithm for getting the near-optimal curve is proposed. Some simulation results are also shown.

Biomimetic Vision

S. Akaho, Y. Goto, H. Mizoguchi, T. Kurita: Pursuit movement of pan-tilt camera by feedback-error-learning
We investigate the ability of feedback-error-learning (FEL) algorithm through applying it to training a pan-tilt camera head for pursuing a moving target. Although the task of pursuit eye movement is reflective and the control has a relatively large delay, our experiments show that the camera head can successfully acquire the skill of pursuit eye movement. Our experimental results also show that the performance of the learning highly depends on the gain parameter of the feedback controller module. The system sometimes oscillates or breaks down when we use the optimal feedback controller (large gain in general), because of over-training-like phenomena. If the gain is too small, the performance is not improved by learning, though the system is stable. Those results suggest the large possibility of FEL algorithm as well as the importance of the design of feedback module.

Multimodal

S. Akaho, S. Hayamizu, O. Hasegawa, K. Itou, T. Akiba, H. Asoh, T. Kurita, K. Sakaue, K. Tanaka, N. Otsu: Recent Developments for Multimodal Interaction by Visual Agent with Spoken Language
H. Asoh, S. Akaho, O. Hasegawa, T. Yoshimura, S. Hayamizu: Intermodal Learning of Multimodal Interaction Systems
S. Akaho, et al.: Multiple attribute learning with canonical correlation analysis and EM algorithm
This paper presents a new framework of learning pattern recognition, called ``multiple attribute learning''. In usual setting of pattern recognition, target patterns have several attribute such as color, size, shape, and we can classify the patterns in several ways by their color, by their size, or by their shape. in normal pattern recognition problem, an attribute or a mode of classification is chosen in advance and the problem is simplified. To the contrary, the problem considered in this paper is to make the learning system solve multiple classification problems at once. That is, a mixture of learning data set for multiple classification problems are given to the learning system at once and the system learn multiple classification rules from the data. We propose a method to solve this problem using canonical correlation analysis and EM algorithm. The effectiveness of the method is demonstrated by experiments.

Jijo2 Robot

H. Asoh, Y. Motomura, I. Hara, S. Akaho, S. Hayamizu, T. Matsui: Combining Probabilistic Map and Dialogue for Robust Life-long Office Navigation
H.Asoh, S.Hayamizu, I.Hara, Y.Motomura, S.Akaho and T.Matsui: Socially Embedded Learning of the Office-Conversant Mobile Robot Jijo-2

Misc

S. Akaho: Statistical learning in optimization: gaussian modeling for population search
Population search algorithms for optimization problems such as Genetic algorithm is an effective way to find an optimal value, especially when we have little information about the objective function. Baluja has proposed effective algorithms modeling the distribution of elites explicitly by some statistical model. We propose such an algorithm based on Gaussian modeling of elites, and analyze the convergence property of the algorithm by defining the objective function as a stochastic model. We point out that the algorithms based on the explicit modeling of the elites' distribution tend to converge to unpreferable local optima, and we modify the algorithm to conquer the defect.

-> Akaho's homepage

Request of reprints, any questions and comments should be sent to
s . a k a h o @ a i s t . g o . j p