IEEE Transactions on Artificial Intelligence,
in press.
Abstract
Recent developments in measurement tools have made it easier to obtain shape data, a collection of point coordinates in vector space that are meaningful when some of them are gathered together.
As a result, clustering of shape data becomes increasingly important.
However, few studies still perform applicable clustering in various cases because some studies rely on their specific shape representations.
Thus, we apply a simple and widely recognized representation and generative model to shape.
A configuration matrix of the point coordinates is used for the representation,
and it is the simplest and most well-accepted representation in conventional shape analysis.
As a generative model, we consider the mixture density function, a well-known model in statistics for expressing a population density function,
which is a linear combination of subpopulation density functions.
The aim of this paper is to present a mixture density-based model that will be useful for clustering shape data.
The clustering of shapes involves estimating the parameters of the model, and this estimation is derived using an EM algorithm based on the model.
As examples of promising shape-data applications, the computational analyses of ape skulls, American football formations, and baseball pitches were performed.
In addition, we evaluated the performance of the EM algorithm by comparing it with other typical clustering methods.
The theoretical results not only contribute to statistical estimation for shape data but also extend the clustering of non-vector shape data.
The experimental results show that the derived EM algorithm performs well in shape clustering.
Keywords
clustering, expectation-maximization algorithm, maximum likelihood estimation, mixture density function
Journal of Computational Science,
Vol. 85, 102494, Feb. 2025.
Abstract
We consider movements between two chosen points in a plane-bounded set.
The length of the movement between two points is measured by the Euclidean metric.
We can then obtain the distribution of distances regarding the bounded set by repeatedly measuring the lengths of the movements.
When we select two points uniformly in a bounded set,
the shape of the bounded set affects the distribution.
The distribution is easily computable by integral approximation and usefully remains invariant under the group of congruence transformations on the bounded set.
For these reasons, some fundamental methods based on this distribution have been used in shape analysis.
Thus, this distribution is important to some practical methods based on the Euclidean metric.
Although some articles have been devoted to the study of non-Euclidean metrics in pattern recognition including shape analysis,
only a few attempts have so far been made using the distribution of such non-Euclidean metrics.
In this paper, we present the expression for the distributions with several non-Euclidean metrics using a density for the set of lines.
This expression can be efficiently calculated as a sum of distributions, by generating a finite set of lines.
We concentrate on a bounded convex set in the plane to define a non-Euclidean metric.
In examples of such metrics on the convex set, we discuss those well-known in taxicab and projective geometries.
In the experimental results using several convex sets, we visualize the distributions with the non-Euclidean metrics by plotting their graphs.
Comparing the distribution based on the Euclidean metric with those based on the non-Euclidean metrics,
we reveal several differences among the distributions.
Keywords
integral approximation, non-Euclidean metric, distribution of distances, plane bounded convex set, Hough transform
Kazunori Iwata, Hiroki Yamamoto, and Kazushi Mimura
IEICE Transactions on Information and Systems,
Vol. E104-D, No. 2, pp. 285-293, Feb. 2020.
Abstract
Shape matching with local descriptors is an underlying scheme in shape
analysis.
We can visually confirm the matching results and also
assess them for shape classification.
Generally, shape matching is implemented by determining the
correspondence between shapes that are represented by their
respective sets of sampled points.
Some matching methods have already been proposed; the main difference
between them lies in their choice of matching cost function.
This function measures the dissimilarity between the local
distribution of sampled points around a focusing point of one shape
and the local distribution of sampled points around a referring point
of another shape.
A local descriptor is used to describe the distribution of sampled
points around the point of the shape.
In this paper, we propose an extended scheme for shape matching
that can compensate for errors in existing local descriptors.
It is convenient for local descriptors to adopt our scheme because
it does not require the local descriptors to be modified.
The main idea of our scheme is to consider the correspondence of
neighboring sampled points to a focusing point when determining the
correspondence of the focusing point.
This is useful because it increases the chance of finding a
suitable correspondence.
However, considering the correspondence of neighboring points
causes a problem regarding computational feasibility, because
there is a substantial increase in the number of possible
correspondences that need to be considered in shape matching.
We solve this problem using a branch-and-bound algorithm, for
efficient approximation.
Using several shape datasets, we demonstrate that our scheme yields a
more suitable matching than the conventional scheme that does not consider
the correspondence of neighboring sampled points, even though our
scheme requires only a small increase in execution time.
Keywords
shape matching, local shape descriptor, probability density estimator,
branch-and-bound algorithm
IEICE Transactions on Information and Systems,
Vol. E103-D, No. 12, pp. 2649-2658, Dec. 2020.
Abstract
The nearest neighbor method is a simple and flexible scheme for the
classification of data points in a vector space.
It predicts a class label of an unseen data point using a majority rule
for the labels of known data points inside a neighborhood of the
unseen data point.
Because it sometimes achieves good performance even for
complicated problems, several derivatives of it have been studied.
Among them, the discriminant adaptive nearest neighbor method is
particularly worth revisiting to demonstrate its application.
The main idea of this method is to adjust the neighbor metric of an
unseen data point to the set of known data points before label prediction.
It often improves the prediction, provided the neighbor metric is
adjusted well.
For statistical shape analysis, shape classification attracts
attention because it is a vital topic in shape analysis.
However, because a shape is generally expressed as a matrix, it is
non-trivial to apply the discriminant adaptive nearest neighbor method
to shape classification.
Thus, in this study, we develop the discriminant adaptive nearest
neighbor method to make it slightly more useful in shape classification.
To achieve this development, a mixture model and optimization
algorithm for shape clustering are incorporated into the method.
Furthermore, we describe several helpful techniques for the initial
guess of the model parameters in the optimization algorithm.
Using several shape datasets, we demonstrated that our method is
successful for shape classification.
Keywords
shape classification, ordinary Procrustes sum of squares, nearest neighbor method, discriminant adaptive nearest neighbor method
IEICE Transactions on Information and Systems,
Vol. E102-D, No. 12, pp. 2547-2556, Dec. 2019.
Abstract
Consider selecting points on a contour in the x-y plane.
In shape analysis, this is frequently referred to as contour sampling.
It is important to select the points such that they effectively represent the shape of the contour.
Generally, the stroke order and number of strokes are informative for that purpose.
Several effective methods exist for sampling contours drawn with a certain stroke order and number of strokes, such as the English alphabet or Arabic figures.
However, many contours entail an uncertain stroke order and number of strokes, such as pictures of symbols, and little research has focused on methods for sampling such contours.
This is because selecting the points in this case typically requires a
large computational cost to check all the possible choices.
In this paper, we present a sampling method that is useful regardless of whether the contours are drawn with a certain stroke order and number of strokes or not.
Our sampling method thereby expands the application possibilities of contour processing.
We formulate contour sampling as a discrete optimization problem that can be solved using a type of direct search.
Based on a geometric graph whose vertices are the points and whose edges form rectangles, we construct an effective objective function for the problem.
Using different shape datasets, we demonstrate that our sampling method is effective with respect to shape representation and retrieval.
IEEE Transactions on Neural Networks and Learning Systems,
Vol. 28, No. 8, pp. 1865-1877, Aug. 2017.
Abstract
Softmax selection is one of the most popular methods for action selection in reinforcement learning.
Although various recently proposed methods may be more effective with
full parameter tuning, implementing a complicated method that
requires the tuning of many parameters can be difficult.
Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning.
In fact, this method works adequately in practice with only one parameter appropriately set for the environment.
The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning.
To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection.
Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability.
The bandwidth is quantitatively assessed in a series of statistical tests.
Kosei Kurisu, Nobuo Suematsu, Kazunori Iwata, and Akira Hayashi
IEICE Transactions on Information and Systems,
Vol. E98-D, No. 4, pp. 930-937, Apr. 2015.
Abstract
In image segmentation, finite mixture modeling has been widely used.
In its simplest form, the spatial correlation among neighboring pixels
is not taken into account, and its segmentation results can be
largely deteriorated by noise in images.
We propose a spatially correlated mixture model in which the mixing
proportions of finite mixture models are governed by a set of
underlying functions defined on the image space.
The spatial correlation among pixels is introduced by putting a
Gaussian process prior on the underlying functions.
We can set the spatial correlation rather directly and flexibly by
choosing the covariance function of the Gaussian process prior.
The effectiveness of our model is demonstrated by experiments with
synthetic and real images.
Pattern Recognition,
Vol. 46, No. 12, pp. 3452-3459, Dec. 2013.
Abstract
The generalized Viterbi algorithm, a direct extension of the Viterbi
algorithm for hidden Markov models (HMMs), has been used to find the
most likely state sequence for hierarchical HMMs.
However, the generalized Viterbi algorithm finds the most likely whole level
state sequence rather than the most likely upper level state
sequence.
In this paper, we propose a marginalized Viterbi algorithm, which
finds the most likely upper level state sequence by marginalizing
lower level state sequences.
We show experimentally that the marginalized Viterbi algorithm is
more accurate than the generalized Viterbi algorithm in terms of
upper level state sequence estimation.
Keywords
time series data, hierarchical HMM, finding the most likely state
sequence, generalized Viterbi algorithm, marginalized Viterbi algorithm
Katsutoshi Ueaoki, Kazunori Iwata, Nobuo Suematsu, and Akira Hayashi
IEICE Transactions on Information and Systems,
Vol. E94-D, No. 12, pp. 2487-2494, Dec. 2011.
Abstract
A two-dimensional shape is generally represented with line drawings or
object contours in a digital image.
Shapes can be divided into two types, namely ordered and unordered
shapes.
An ordered shape is an ordered set of points, while an unordered shape
is an unordered set.
As a result, each type typically uses different attributes to define the
local descriptors involved in representing the local distributions of
points sampled from the shape.
Throughout this paper, we focus on unordered shapes.
Since most local descriptors of unordered shapes are not
scale-invariant, we usually make the shapes in an image data set the
same size through scale normalization, before applying shape matching
procedures.
Shapes obtained through scale normalization are suitable for such
descriptors if the original whole shapes are similar.
However, they are not suitable if parts of each original shape are drawn
using different scales.
Thus, in this paper, we present a scale-invariant descriptor
constructed by von Mises distributions to deal with such shapes.
Since this descriptor has the merits of being both scale-invariant and a
probability distribution, it does not require scale normalization and
can employ an arbitrary measure of probability distributions in
matching shape points.
In experiments on shape matching and retrieval, we show the
effectiveness of our descriptor, compared to several conventional
descriptors.
Neural Networks, Vol. 24, No. 10, pp. 1074-1081, Dec. 2011.
Abstract
We present a general analysis of return maximization in reinforcement
learning.
This analysis does not require assumptions of Markovianity,
stationarity, and ergodicity for the stochastic sequential decision
processes of reinforcement learning.
Instead, our analysis assumes the asymptotic equipartition property
fundamental to information theory, providing a substantially
different view from that in the literature.
As our main results, we show that return maximization is achieved by
the overlap of typical and best sequence sets, and we present a class
of stochastic sequential decision processes with the necessary
condition for return maximization.
We also describe several examples of best sequences in terms of
return maximization in the class of stochastic sequential decision
processes, which satisfy the necessary condition.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 30, No. 1, pp. 76-88, Jan. 2008.
Abstract
We introduce novel dissimilarity into a probabilistic clustering task
to properly measure dissimilarity among multiple clusters when each
cluster is characterized by a subpopulation in the mixture model.
This measure of dissimilarity is called redundancy-based dissimilarity
among probability distributions.
From aspects of both source coding and a statistical hypothesis test,
we shed light on several of the theoretical reasons for the
redundancy-based dissimilarity among probability distributions being a
reasonable measure of dissimilarity among clusters.
We also elucidate a principle in common for the measures of
redundancy-based dissimilarity and Ward's method in terms of
hierarchical clustering criteria.
Moreover, we show several related theorems that are significant for
clustering tasks.
In the experiments, properties of the measure of redundancy-based
dissimilarity are examined in comparison with several other measures.
Keywords
clustering, mixture model, dissimilarity measure, information theory,
Ward's method
IEEE Transactions on Neural Networks, Vol. 17, No. 4, pp. 829-842, July 2006.
Abstract
We exhibit an important property called the asymptotic equipartition
property on empirical sequences in an ergodic multi-agent Markov
decision process.
Using the asymptotic equipartition property which facilitates the
analysis of multi-agent learning, we give a statistical property of
multi-agent learning, such as reinforcement learning, near the end of
the learning process.
We examine the effect of the conditions among the agents on the
achievement of a cooperative policy in three different cases: blind,
visible, and communicable.
Also, we derive a bound on the speed with which the empirical sequence
converges to the best sequence in probability, so that the multi-agent
learning yields the best cooperative result.
Neural Networks, Vol. 19, No. 1, pp. 62-75, Jan. 2006.
Abstract
We discuss an important property called the asymptotic equipartition
property on empirical sequences in reinforcement learning.
This states that the typical set of empirical sequences has
probability nearly one, that all elements in the typical set are
nearly equi-probable, and that the number of elements in the typical
set is an exponential function of the sum of conditional entropies if
the number of time steps is sufficiently large.
The sum is referred to as stochastic complexity.
Using the property we elucidate the fact that the return maximization
depends on two factors, the stochastic complexity and a
quantity depending on the parameters of environment.
Here, the return maximization means that the best sequences in terms
of expected return have probability one.
We also examine the sensitivity of stochastic complexity, which is
a qualitative guide in tuning the parameters of action-selection
strategy, and show a sufficient condition for return maximization in
probability.
IEICE Transactions on Information and Systems, Vol. E88-D,
No. 1, pp. 135-142, Jan. 2005.
Abstract
We regard the events of a Markov decision process as the outputs from a
Markov information source in order to analyze the randomness of
an empirical sequence by the codeword length of the sequence.
The randomness is an important viewpoint in reinforcement learning
since the learning is to eliminate the randomness and to find an
optimal policy.
The occurrence of optimal empirical sequence also depends on the
randomness.
We then introduce the Lempel-Ziv coding for measuring the randomness
which consists of the domain size and the stochastic complexity.
In experimental results, we confirm that the learning and the
occurrence of optimal empirical sequence depend on the randomness and
show the fact that in early stages the randomness is mainly
characterized by the domain size and as the number of time steps
increases the randomness depends greatly on the complexity of Markov
decision processes.
IEEE Transactions on Neural Networks, Vol. 15, No. 4,
pp. 792-799, July 2004.
Abstract
In this paper, we regard the sequence of returns as outputs from a
parametric compound source.
Utilizing the fact that the coding rate of the source shows the amount
of information about the return, we describe $\ell$-learning
algorithms based on the predictive coding idea for estimating an
expected information gain concerning future information, and give a
convergence proof of the information gain.
Using the information gain, we propose the ratio $w$ of return loss to
information gain as a new criterion to be used in probabilistic action
selection strategies.
In experimental results, we found that our $w$-based strategy performs
well compared with the conventional Q-based strategy.
Keywords
reinforcement learning, predictive coding, information gain,
probabilistic action selection strategy
International Journal of Knowledge-based Intelligent
Engineering Systems, Vol. 6, No. 2, pp. 112-119, Apr. 2002.
Abstract
Many active learnings in the training of a partially trained MLP
have been proposed.
We note any active learning performance depends on initial
training data.
The initial training data plays an important role for active
learning performance, because any active learning algorithm
generates additional training data, based on existing training data.
Most of conventional methods have generated initial data at
random using a pseudo-random number.
However, in some practical cases, we can not prepare enough data by the
limit of time and cost.
Therefore, the bias of initial data becomes critical,
especially in the case of a small initial data.
In this paper, by using $L_{2}$-discrepancy, we verify the effect of the
bias of initial data for the resulting classification accuracy.
We consequently confirm an advantage of low-discrepancy sequence
as a generation method of initial training data, compared with
a pseudo-random number.
Finally, we also discuss some advantages and disadvantages of low
discrepancy sequences.
Keywords
active learning, multi-layer perceptron, initial training data,
pseudo-random number, low-discrepancy sequence