Major publication of Shotaro Akaho
EM algorithm and mixture models(see also: multimodal)
- S. Akaho:
Mixture model for image understanding and the EM algorithm
We present a mixture model that can be applied to
the recognition of multiple objects in an image plane.
The model consists of any shape of submodules.
Each submodule is a probability density function of data points
with scale and shift parameters,
and the modules are combined with weight probabilities.
We present the EM (Expectation-Maximization) algorithm to estimate
those parameters.
We also modify the algorithm in the case that data points are restricted
in an attention window.
- S. Akaho:
The EM Algorithm for multiple object recognition
We propose a mixture model that can be applied to
the recognition of multiple objects in an image plane.
The model consists of any shape of modules;
Each module is a probability density function of data points
with scale and shift parameters,
and the modules are combined with weight probabilities.
We present the EM (Expectation-Maximization) algorithm to estimate
those parameters.
We also modify the algorithm in the case that data points are restricted
in an attention window.
- S. Akaho and H.J. Kappen:
Nonmonotonic generalization bias of Gaussian mixture models
Most theories of generalization performance of learning tell us
that the generalization bias, which is defined as
the difference between an training
error and an generalization error, increases proportionally to
the number of modifiable parameters in average.
The present paper, however, reports the case
that the generalization bias of a Gaussian mixture model
does not increase even if the superficial effective number
of parameters increases,
where the number of elements in the Gaussian mixture is controlled by
a continuous parameter.
- S. Akaho and H.J. Kappen:
Nonmonotonic generalization bias of Gaussian mixture models
Theories of learning and generalization hold
that the generalization bias, defined as
the difference between the training
error and the generalization error, increases on average with
the number of adaptive parameters.
The present paper, however, shows that this general tendency
is violated for a Gaussian mixture model. For temperatures just below the
first symmetry breaking point, the effective number of adaptive
parameters increases and the generalization bias decreases. We compute
the dependence of the Neural Information Criterion (NIC) on temperature
around the symmetry breaking. Our results are confirmed by numerical
crossvalidation experiments.
ICA
- S. Akaho, Y. Kiuchi, S. Umeyama:
MICA: Multimodal Independent Component Analysis
We propose MICA (multimodal independent component analysis)
that extends ICA (independent component analysis)
to the case that there is a pair of information sources.
MICA extracts statistically dependent pairs of
features from the sources, where the components of feature vector
extracted from each source are independent.
Therefore, the cost function is constructed to maximize the degree of
pairwise dependence as well as optimizing the cost function of ICA.
We approximate the cost function by two dimensional Gram-Charlier
expansion and propose a gradient descent algorithm derived by
Amari's natural gradient.
The relation between MICA and traditional CCA (canonical correlation analysis)
is similar to the relation between ICA and PCA (principal component analysis).
- S. Akaho: Conditionally Independent Component Extraction for
Naive Bayes Inference
This paper extends the framework of independent component analysis (ICA)
to supervised learning.
The key idea is to find a conditionally independent representation
of input variables for given output.
The representation is useful for the naive Bayes learning which has
been reported to perform as well as more sophisticated methods.
The learning algorithm is derived in a similar criterion to ICA.
Two dimensional entropy takes an important role,
while one dimensional entropy does in ICA.
Regularization and kernel methods
- S. Akaho:
Regularization Learning of Neural Networks for Generalization
In this paper, we propose a learning method of neural networks based on
the regularization method and analyze its generalization capability.
In learning from examples, training samples are independently drawn from
some unknown probability distribution.
The goal of learning is minimizing the expected risk for
future test samples, which are also drawn from the same distribution.
The problem can be reduced to estimating
the probability distribution with only samples, but it is generally ill-posed.
In order to solve it stably, we use the regularization method.
Regularization learning can be done in practice
by increasing samples by adding
appropriate amount of noise to the training samples.
We estimate its generalization error,
which is defined as a difference between the expected
risk accomplished by the learning and the truly minimum expected risk.
Assume $p$-dimensional density function is $s$-times differentiable for
any variable.
We show the mean square of the generalization error of regularization
learning is given as $D n^{-2s/(2s+p)}$, where $n$ is the number of samples
and $D$ is a constant dependent on the complexity of the neural network and
the difficulty of the problem.
VC dimension, Network capacity
- S. Akaho and S. Amari:
On the Capacity of three-layer networks
- S. Akaho:
Optimal Decay Rate of Connection Weights in Covariance Learning
Associative memory of neural networks can not store items more than its memory
capacity. When new items are given one after another, connection weights
should be decayed so that the number of stored items does not exceed
the memory capacity.
This paper presents the optimal decay rate
that maximizes the number of stored items, using the method of statistical
dynamics.
- S. Akaho:
Capacity and Error Correction Ability of Sparsely Encoded Associative Memory
with Forgetting Process
Associative memory model of neural networks can not store items
more than its memory capacity.
When new items are given one after another, its connection
weights should be decayed so that the number of stored items does
not exceed the memory capacity ({\em forgetting process}).
This paper analyzes the sparsely encoded associative memory, and
presents the optimal decay rate that maximizes the number
of stored items.
The maximal number of stored items is given by
$O(n/a\log n)$ when the decay rate is $1-O(a\log n/n)$, where the network
consists of $n$ neurons with activity $a$.
- S. Akaho:
VC dimension theory for a learning system with forgetting
In a changing environment,
forgetting old samples is an effective method to improve the adaptability
of learning systems.
However, too fast forgetting causes a decrease of
generalization performance. In this paper, we analyze the generalization
performance of a learning system with a forgetting parameter.
For a class of binary discriminant functions,
it is proved that the generalization error is given by $O(\sqrt{h\alp})$
($O(h\alp)$ in a certain case),
where $h$ is the VC dimension of the class of functions and $1-\alp$
represents a forgetting rate. The result provides a criterion to determine
the optimal forgetting rate.
Pattern Recognition, Feature extraction
- S. Akaho:
Translation, Scale and Rotation Invariant Features Based on High-Order
Autocorrelations
Local high-order autocorrelation features proposed by Otsu
have been successfully applied to
face recognition and many other pattern recognition problems.
These features are invariant under translation, but not invariant under
scale and rotation.
We construct scale and rotation invariant features
from the local high-order autocorrelation features.
- S. Akaho:
Curve fitting that minimizes the mean square of perpendicular distances
from sample points
This paper presents a new method of curve-fitting to a set of
noisy samples.
In the case of
fitting a curved line (or a curved surface) to given sample points,
it seems natural the curve is decided so as to minimize the mean square
of perpendicular distances from the sample points.
However it is difficult to get the optimal curve
in the sense of this criterion.
In this paper, the perpendicular distance is approximated
by local linear approximation of function,
and the algorithm for getting the near-optimal curve is proposed.
Some simulation results are also shown.
- S. Akaho, Y. Goto, H. Mizoguchi, T. Kurita:
Pursuit movement of pan-tilt camera by
feedback-error-learning
We investigate the ability of feedback-error-learning (FEL) algorithm through
applying it to training a pan-tilt camera head for pursuing a moving target.
Although the task of pursuit eye movement is reflective and the control
has a relatively large delay, our experiments show that
the camera head can successfully acquire the skill of pursuit eye movement.
Our experimental results also show that the performance of the
learning highly
depends on the gain parameter of the feedback controller module.
The system sometimes oscillates or breaks down when we use the optimal
feedback controller (large gain in general),
because of over-training-like phenomena.
If the gain is too small, the performance is not improved by learning,
though the system is stable.
Those results suggest the large possibility of FEL algorithm
as well as the importance of the design of feedback module.
- S. Akaho, S. Hayamizu, O. Hasegawa, K. Itou, T. Akiba, H. Asoh,
T. Kurita, K. Sakaue, K. Tanaka, N. Otsu:
Recent Developments for Multimodal Interaction
by Visual Agent with Spoken Language
- H. Asoh, S. Akaho, O. Hasegawa, T. Yoshimura, S. Hayamizu:
Intermodal Learning of Multimodal Interaction Systems
- S. Akaho, et al.:
Multiple attribute learning with canonical correlation analysis and EM
algorithm
This paper presents a new framework of
learning pattern recognition, called ``multiple attribute learning''.
In usual setting of pattern recognition,
target patterns have several attribute such as
color, size, shape, and we can classify the patterns
in several ways by their color, by their size, or by their shape.
in normal pattern recognition problem, an attribute or
a mode of classification is chosen in advance and
the problem is simplified.
To the contrary, the problem considered in this paper is to make
the learning system solve multiple classification problems at once.
That is, a mixture of learning data set for multiple classification
problems are given to the learning system at once and
the system learn multiple classification rules from the data.
We propose a method to solve this problem
using canonical correlation analysis and EM algorithm.
The effectiveness of the method is demonstrated by experiments.
- H. Asoh, Y. Motomura, I. Hara, S. Akaho, S. Hayamizu, T. Matsui:
Combining Probabilistic Map and Dialogue for Robust Life-long Office
Navigation
- H.Asoh, S.Hayamizu, I.Hara, Y.Motomura, S.Akaho and T.Matsui:
Socially Embedded Learning of the Office-Conversant Mobile
Robot Jijo-2
Misc
- S. Akaho:
Statistical learning in optimization:
gaussian modeling for population search
Population search algorithms for optimization problems
such as Genetic algorithm is an effective way
to find an optimal value, especially when we have little information
about the objective function.
Baluja has proposed effective algorithms modeling the distribution of elites
explicitly by some statistical model.
We propose such an algorithm based on Gaussian modeling of elites, and
analyze the convergence property of the algorithm by
defining the objective function as a stochastic model.
We point out that the algorithms based on the explicit
modeling of the elites' distribution tend to converge to unpreferable
local optima, and we modify the algorithm to conquer the defect.
-> Akaho's homepage
Request of reprints, any questions and comments
should be sent to
s . a k a h o @ a i s t . g o . j p