|
Programme
Overview
Detailed
Programme
Feature Selection
and Kernel Methods (FSKM) |
CHAIR: SEIICHI OZAWA
Monday,
March 21, 14h20-16h00
FSKM-1 |
|
Title: |
|
|
Boosting Kernel Discriminant Analysis
with Adaptive Kernel Selection |
Author(s): |
Shinji Kita ,
Satoshi Maekawa,
Seiichi Ozawa,
Shigeo Abe |
Abstract: |
In this paper, we present a new method to enhance
classification performance based on Boosting by introducing
nonlinear discriminant analysis as feature selection. To reduce
the dependency between hypotheses, each hypothesis is constructed
in different feature spaces formed by Kernel Discriminant
Analysis (KDA). Then, these hypotheses are integrated based
on AdaBoost. To conduct KDA in each Boosting teration within
realistic time, a new method of kernel selection is also proposed.
Several experiments are carried out for the blood cell data
and the thyroid data to evaluate the performance of the proposed
method. The result shows that it is almost the same as the
best performance of Support Vector Machine without any annoying
parameter search. |
|
FSKM-2 |
|
Title: |
|
|
Product Kernel
Regularization Networks |
Author(s): |
P. Kudova,
T. Samalova |
Abstract: |
We study approximation problems formulated as
regularized minimization problems with kernel-based stabilizers.
These approximation schemas exhibit easy derivation of solution
to the problem in the shape of linear combination of kernel
functions (one-hidden layer feed-forward neural network schemas).
We prove uniqueness and existence of solution to the problem.
We exploit the article by N. Aronszajn on reproducing kernels
and use his formulation of product of kernels and resulting
kernel space to derive a new approximation schema -- a Product
Kernel Regularization Network. We present a concrete application
of PKRN and compare it to classical Regularization Network
and show that PKRN exhibit better approximation properties.
|
|
FSKM-3 |
|
Title: |
|
|
Statistical Correlations and Machine
Learning for Steganalysis |
Author(s): |
Qingzhong Liu,
Andrew H. Sung,
Bernardete M. Ribeiro |
Abstract: |
In this paper, we present a scheme for steganalysis
based on statistical correlations and machine learning. In
general, digital images are highly correlated in the spatial
domain and the wavelet domain; hiding data in these media
will affect the correlations. Different correlation features
are chosen based on ANOVA (analysis of variance) in different
steganographic systems. Several machine learning methods are
applied to classify the extracted feature vectors. Experimental
results indicate that our scheme in detecting the presence
of hidden messages in several steganographic systems is highly
effective. |
|
FSKM-4 |
|
Title: |
|
|
The Use of Multi-Criteria in Feature
Selection to Enhance Text Categorization |
Author(s): |
Son Doan,
Susumu Horiguchi |
Abstract: |
This paper considers the problem of feature
selection in text categorization. Previous works in feature
selection often used filter model in which features, after
ranked by a measure, are selected based on a given threshold.
In this paper, we present a novel approach to feature selection
based on multi-criteria of each feature. Instead of only one
criterion, multi-criteria of a feature are used; and a procedure
based on each threshold of the criterion is proposed. This
framework seems to be suitable for text data and applied to
feature selection in text categorization. Experimental results
on Reuters-21578 benchmark data show that our approach has
a promising scheme and enhances the performance of a text
categorization system. |
|
FSKM-5 |
|
Title: |
|
|
Text Classification from Partially
Labeled Distributed Data |
Author(s): |
C. Silva,
B. Ribeiro |
Abstract: |
One of the main problems with text classification
systems is the lack of labeled data, as well as the cost of
labeling unlabeled data [1]. Thus, there is a growing interest
in exploring the combination of labeled and unlabeled data,
i.e., partially labeled data [2], as a way to improve classification
performance in text classification. The ready availability
of this kind of data in most applications makes it an appealing
source of information. The distributed nature of the data,
usually available online, makes it a very interesting problem
suited to be solved with distributed computing tools, delivered
by emerging GRID computing environments. We evaluate the advantages
obtained by blending supervised and unsupervised learning
in a support vector machine automatic text classifier. We
further evaluate the possibility of learning actively and
propose a method for choosing the samples to be learned. |
|
|
|