Testbench for Evaluation of Image Classifiers

J. Šilhavá,
Department of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, Czech Republic
silhava@fit.vutbr.cz

V. Beran,
Department of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, Czech Republic
beran@fit.vutbr.cz

P. Chmelař,
Department of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, Czech Republic
chmelarp@fit.vutbr.cz

A. Herout,
Department of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, Czech Republic
herout@fit.vutbr.cz

M. Hradiš,
Department of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, Czech Republic
ihradis@fit.vutbr.cz

R. Juránek,
Department of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, Czech Republic
ijuranek@fit.vutbr.cz

P. Zemčík,
Department of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, Czech Republic
zemcik@fit.vutbr.cz


Contents

1. Introduction
2. Classification Background
2.1. Adaptive Boosting - Adaboost
2.2. Neural Networks
2.3. Support Vector Machines
2.4. The Feature Vectors and the Classifiers
3. Testbench Description
3.1. Part One - Data Generation
3.2. Part Two - Training and Testing
3.3. Part Three - Evaluation
3.4. Training on a Cluster of Computers
3.5. Web Access to the System
4. Sample Experiments and Their Results
5. Conclusion and Future Work
6. Acknowledgements
References

Abstract

Classifiers used in image processing and computer vision are frequent subject of research and exploitation in applications. This contribution does not directly involve research in the classification itself but rather introduces a systematic approach of evaluation of image classifiers, comparison between the classifiers, and “tuning” the classifiers for particular applications. The proposed approach is included in an open software system for evaluation of the image classifiers. The contribution also demonstrates application of the system on several selected classifiers and discusses the possibilities and results.

Keywords: Classifier, Image processing, Computer vision, AdaBoost, Support Vector Machine, Artificial Neural Network, Face detection, Optical character recognition, Classifier Evaluation.

1. Introduction

Classifiers are in image processing and computer vision typically used in order to perform pattern recognition [Theodoridis et al., 2003]. The known image classifiers (namely those involved in this paper) exhibit mutually similar descriptive power. The differences between the classifiers, however, do exist across different applications. Therefore, it is not possible to generally reject one classification principle in favor of the others but it is necessary to carefully choose between the available classifiers for a particular purpose and hardware execution platform.

This paper describes and offers for wider use a common comparison protocol and a software system based on this protocol. When developing specialized improvements to one of the classification methods (AdaBoost, described further), the authors felt the need to compare the modified algorithm to the other generally used classification methods and other versions of the investigated algorithm. The system, however, developed itself to a more general evaluation platform for classifiers.

The reader of this paper can find a summary of the classification background in Section 2. The evaluation protocol and defined communication formats are described in Section 3. The practical experience with the system and first comparison of the tested methods and the demands for its future development are summarized in Sections 4 and 5.

2. Classification Background

Visual data are rich with hidden information that is highly demanded to acquire in many applications. Classification is a supervised data analysis technique developed to perform such task; it extracts models describing and distinguishing known data classes that can be used to predict labels of unknown objects [Han and Kamber, 2006]. In machine learning and vision, classification of hidden parameters based on visible features is concerned. This process can be adversely affected by information loss due to illumination changes, noise, etc. and can be seen as a channel in the theory of information.


Fig. 1: The classification channel


The process includes the observation of the real-world data source – image acquisition. Images are necessarily preprocessed and features are extracted from them. It is a process that forms a syntax description, most commonly presented as a feature vector x. The classification function can be considered a mapping between feature spaces and a set of (semantic) class labels η: X → Y, specifically η(x)=y.

The classification task can be performed as the maximization of conditional (posterior) probability P(y | x) given the training data, which is a set of tuples {(x1, y1), …}, using Bayes theorem as:

η*(x)= arg maxx P(y | x) = arg maxx P(x | y) P(x)

(1)

The optimal (Bayesian) classifier is thus able to make decisions based on prior information P(y) – a simple statistic over the training data and their likelihood P(x | y) [Han and Camber, 2006].

However, many other classification methods were proposed by researchers in statistics, machine learning and pattern recognition [Mierswa et al., 2006]. They are based on the inductions (of rules) or associations present in the feature space, information gain or distance of the attributes, mapped to different dimensions (kernel methods), or they use back-propagation to minimize the mean classification error (artificial neural nets).

Those algorithms do not differ only in the methods used to build their models but also in the prediction accuracy. Thus an independent test set is used to compute the error based on the difference between the predicted and known ground-truth value y. The classification techniques might be supplemented by the ensemble methods (such as boosting) that increase the classification precision by combining various classifiers.

The overall performance of classification algorithms very much depends on the used features – developers should use as much (prior) information as they have. Therefore in the paper, the focus is also given to the comparison of classifiers that only differ in types of features extracted from images both in spatial and frequency domain. The overview of the used classification methods is described below.

2.1. Adaptive Boosting – AdaBoost

AdaBoost in its basic form greedily selects weak hypotheses (low or medium precision classifiers) that are only moderately accurate to create very accurate classifier. The result of such classifier is based on a linear combination of the selected weak hypotheses. The weak hypotheses can be of arbitrary complexity, but in many cases are very simple (e.g. based on response of a convolution with a wavelet).

AdaBoost was first introduced by Freund and Schapire [1997] and since then many modifications have been proposed. In the original algorithm, the output of the weak hypotheses is restricted to binary value and thus the algorithm is referred to as discrete AdaBoost. Schapire and Singer [1999] introduced real AdaBoost which allows confidence rated predictions and is most commonly used in combination with domain partitioning weak hypotheses (e.g. decision trees).

AdaBoost was used for object detection in image for the first time by Viola and Jones [2001] in combination with Haar wavelets. They also used cascade of classifiers to reduce the average number of evaluated weak hypotheses. Another way to tradeoff the classification precision and time was proposed by Šochman and Matas [2005]. In their WaldBoost they keep the linear structure of classifier and select early termination thresholds of strong classifier sum. The early termination is used in combination with bootstrapping which allows extremely high number of samples to be used in training process. Bootstrapping is most advantageous in detection tasks where the number of available “background” samples is almost unlimited.

The weak classifiers and their weights selected by AdaBoost are not optimal as the process is greedy. There has been also some work addressing this fact, e.g. Total corrective step [Šochman and Matas, 2004] or FloatBoost [Li et al, 2002].

AdaBoost proved to be resistant to overfitting which is due the fact that with growing complexity of the classifier it increases margins between the samples of different classes, but still can overfit in presence of noise. This is in many ways similar to SVM (Support Vector Machines, described below). The computational complexity of AdaBoost training is relatively low – it does not depend on the number of previously selected weak hypotheses and grows only linearly with number training samples and available weak hypotheses to choose from. This fact allows relatively large number of weak hypotheses and training samples which implies more reliable classifiers and further improves the resistance to overfitting.

2.2. Neural Networks

An Artificial Neural Network is considered to be a distributed parallel data processing structure that is composed of usually very high number of mutually connected efficient elements called neurons. Each of them can receive an arbitrary number of various input data in the same time. Neurons transform the input data to the output according to a given transmission function. Neural net models are specified by the net topology, neuron transmission functions, and training or learning rules. These rules specify an initial set of weights and indicate how weights should be adapted during the training process.

Development of neural net models began more than 60 years ago with the work of McCulloch and Pitts [1943]. The original perceptron convergence procedure for adjusting weights was developed by Rosenblatt [1959]. Minsky and Papert [1969] showed the limitations of single layer perceptrons, but these have been solved by multi-layer perceptrons and the back-propagation learning algorithm [Rumelhart et al., 1986]. A demonstration of the power of this algorithm was provided by Sejnowski and Rosenberg [1986]. Recent usages of neural network in 2D object recognition field are mentioned for example in [Gllavata et al., 2004], [Cantoni and Petrosino, 2000].

2.3. Support Vector Machines

Support Vector Machine (SVM) is an algorithm for classification of linear data. The training process finds a (maximum margin) hyper-plane for separating of the data using essential training tuples called support vectors, originally described by Vapnik and Lerner [1963]. In case of nonlinearly separable data, the kernels can be used to transform the original data to a higher number of dimensions from where the separation can be found as described by Bernhard Boser, Isabelle Guyon and Vapnik [1992].

The use of the SVM is widely spread in the machine vision systems; it is supposed to be one of the best learning techniques for the content based image retrieval [Gosselin and Cord, 2004]. The advantage of SVM is that it uses kernels, which can handle large feature spaces with high classification accuracy and performance even in multi-class tasks of nonlinearly separable data. The disadvantage of SVM is the need for manual selection of appropriate kernel and parameters [Howley and Madden, 2005].

2.4. The Feature Vectors for the Classifiers

The features strongly influence the quality of the resulting classifier. Generally, it is desirable to use features with the highest discriminative power but at the same time keep their number as low as possible. The requirement for discriminative power is quite natural as the classifier can make decisions only if enough information is available. Moreover, the more relevant information is available the more reliable classification rules are generally created. The obvious reason for keeping the number of features low is computational complexity. Another reason, not so obvious, is that with higher number of free classifier parameters (e.g. neuron weights) higher number of training samples is needed to maintain the generalization properties of the classifier.

Besides application specific feature design concepts, two general approaches exist to reduce the number of features while preserving the discriminative power – feature selection and data transformation. Linear transforms, such as principal component analysis, linear discriminant analysis and independent component analysis are the frequently used and do not require any a priori knowledge about the classification problem. Transformations with suitable fixed basis vectors, such as discrete cosine transformation or wavelet transforms, can be used when working with signals. Feature selection is essentially a search for a subset of features which maximizes some criterion function. Many algorithms exist which can be used for this purpose as well as many criterion functions. Information on this topic can be found in [Molina et al., 2002].

In many computer vision classification tasks, it has been found that feature selection from overcomplete set of wavelet features yields very good results. In detection tasks, Haar features are often combined with some modification of AdaBoost classifier [Viola and Jones, 2001]. In case that time in not so critical, Gabor wavelets are preferred for their higher descriptive power. For example Bartlett et al. [2005] when experimenting with facial expression recognition have achieved best results with SVM trained on a subset of Gabor wavelets selected by AdaBoost. In this work, the focus is on this kind of classification scenario where the classifier is trained using features selected from a pool of wavelets or other simple features.

2.5. Classifier Evaluation

The selected features and classification methods can be evaluated and compared using the following criteria. The accuracy of the classifier refers to the ability of given classifier to correctly predict the class label of previously unknown object (or a feature vector). Additionally, classification models are evaluated using the Receiver Operating Characteristics. ROC curves show the trade-off between the true positive (correctly identified) and the false-positive rate – negative examples identified as objects. The model can be then adjusted to be at the Equal Error Rate (EER) point.

The other characteristic [Han, Kamber, 2006] is the robustness; it is the ability of correct classification of noisy and modified data. Interesting information is also the computational cost of the classifier (performance) and scalability – the ability of classifier to train and test on arbitrary amount of data. The last interesting characteristic – interpretability – is a subjective measure and cannot be (easily) determined by an automated evaluation system.

3. Testbench Description

The testbench is designed to focus the comparison on several aspects. Obviously, overall mutual performance of the different classification engines is compared to give a hint for selection of classifier for a particular purpose. Besides that, the system should compare different modifications (generations, versions) of a single classifier engine to allow quantitative evaluation which is helpful in classifier development undertaken at our institution. New features are evaluated by this comparison engine as well as modifications to the existing ones. Also, various thresholds and other parameters to the methods are tuned and evaluated. To be able to run a large number of experiments testing various features’ or parameters’ setup, a distributed operation is required from the comparison engine, which is described below in this chapter.

Also robustness of the classifiers and the used features against inconsistencies in the data sets are tested. The training data sets are not always annotated very precisely but the rotation, scale, and center position of the samples vary within some ranges (along a distribution). The quality of the classifier trained on better or worse training data must be evaluated as it is an important characteristic of the used method. Also the testing samples may exhibit some diversity in geometric deformations. Obviously, a classifier that performs well on a largely modified testing set is desired, as its invariance especially to the scale and rotation (and also to noise and blur) is higher.

To sum up, mostly the following properties of the classifiers and their modifications need to be tested by the evaluation system:

It is possible to divide the whole testbench into three parts, the dataset generation part, training and testing part, and evaluation part (see Fig. 2 and Fig. 3).

3.1. Part One – Data Generation

The first part is the datasets generation part. In addition to using the existing datasets, several new datasets have been created and some tools have been also created to speed up the annotation and dataset generation.

The process of new dataset production in the presented system is depicted in Figure 2. Firstly, regions of interest (ROI) are annotated and stored by an annotation program. The annotated images are processed by a generator, which can produce a number of datasets in RAW format based on the one annotated input. The generator can perform such transformations as rotation, scaling, shifting, shearing, also it can change brightness, contrast, gamma and add noise into the images. The amount of all of these distortions is controlled by a configuration file defining the ranges or distributions. Both positive sample dataset (examples of the class of object that is to be searched) or negative sample dataset (counterexamples) depending on input data are obtained from the generator. These datasets are joined to compose a single file, which is equipped with an annotation defining the searched class for each data sample, serving as the ground truth. This dataset and annotation go to the mixer which takes random samples into the training and testing datasets so that no sample is included in both of the datasets. This operation can be repeated N times to get more variations of training and testing datasets.


Fig. 2: Datasets generation part. N is the number of repetitions


3.2. Part Two – Training and Testing

The second part is the training and testing part (see Fig. 3). It is possible to use training and testing datasets made using the generation tool chain described above or to use external testing and/or training datasets.

Firstly, features suitable for classification are extracted from the data. Haar and Gabor wavelets, local binary patterns (LBP) and new experimental features under development (LRD – local rank differences) are currently used to transform the data and AdaBoost can be used to select the set of relevant features. This part is integrated into a custom C++ solution which was created to be memory and time efficient and modular. The modularity makes it possible to relatively easily add new kinds of features, data sources and feature selection algorithms. The feature extraction process is controlled by two xml files – one for configuration and one describing the data sources. In the testbench, both of these xml files are automatically generated. It is only necessary to specify feature extraction variations using one text file. The resulting feature vectors are stored in a simple text format.

Besides the feature selection, the implemented AdaBoost can be also used as one of the classifiers. The two-class discrete and real versions of AdaBoost are currently implemented [Freund and Schapire, 1997] [Schapire and Singer, 1999]. The real AdaBoost currently supports domain partitioning weak hypotheses. Decision trees are used to partition the instance space in the case of real valued features, but features naturally partitioning the instance space (e.g. LBP) are also supported. Some additional options are also available, such as alpha quantization which is used when training classifiers for hardware execution platforms.

Another classification methods integrated in the presented testbench is the artificial neural network. The neural network used is the Multilayer perceptron (MLP) with three layers. A sigmoidal function is used as the activation function in the hidden layer and a softmax [Bridle 1990] is used as an activation function in the output layer. The neural net is trained using the back-propagation algorithm [Rumelhart et al., 1986] where the neural network error minimization is the learning measurement. The gradient descent method is used. The error is evaluated by the cross entropy between obtained and desired output [Ney 1995]. The training dataset is divided into the training and cross-validation datasets, so three disjunctive datasets are used in the experiments - the training set, the cross-validation set and the testing set. The New Bob algorithm uses the cross-validation data set to control the neural network learning rate and stops the training in appropriate time to avoid over-training. The QuickNet is used for training, which is a part of the SPRACHcore software package available at ISCI, on web page [ISCI 2004]. It uses highly optimized matrix-matrix multiplications [Farber 1997] to reach more optimal performance, even when using the sequential version of the Quicknet-MLP back-propagation training.

This testbench uses the LibSVM [Chang and Lin, 2001] implementation of the SVM. It may be supported by automatic selection of the kernel (linear, RBF) and parameters (the penalty and the kernel parameter) using genetic algorithm and cross-validation [Mierswa et al., 2006]. However, the authors are working on faster recursive scaling and combination of parameters in octaves (one way progress).

3.3. Part Three – Evaluation

The third part is the evaluation part. All necessary steps have been done to be able to compare and evaluate various dependencies mentioned above. The results from all classification methods have the same format to allow uniform processing by the evaluation tools. It is possible to calculate the Receiver Operating Characteristic (ROC) and the Detection Error Tradeoff (DET) curves or count the Equal Error Rate (EER) [Egan 1975]. This part of the whole process requires the most future improvements to obtain more reliable results in a more automated process, as mentioned in section 5.



Fig. 3: Training and testing part, Evaluation part. N is the number of repetitions. M is the number of feature configurations


3.4. Training on a Cluster of Computers

For training, a grid of computers managed by SGE (Sun Grid Engine) is being used. This approach allows for many different training experiments in parallel. The computation cluster available for these evaluation purposes consists of approximately 100 computers. Each of the computers has up to 4 slots for different jobs. The jobs are specified by an execution script that defines what task and on what cluster nodes should be executed. When running many similar jobs is desired, the system offers a single script executed as an array-job [Sun 2005].

The task for a user wanting to compare different implementations of classifiers is to prepare the data, the experiment configurations and execution scripts, and then submit jobs to the grid engine. The system simply takes unassigned job with highest priority or longest waiting time from the queue, assigns it to a free slot, and executes its script. This whole process can be controlled from one terminal although the computational power is distributed across many computers (in our case, several blade dedicated servers are used as well as classroom computers at times they are not used by the students).


Fig. 4: Grid scheme. User submits jobs that are automatically assigned to execution hosts


Several tools and shell scripts to simplify using of the SGE under the testbench have been developed. The main script creates necessary directories for training results and generates desired number of datasets. Another script then submits the AdaBoost training processes with generated datasets to the engine. The AdaBoost selects features and generates feature vectors for Neural Network and SVM. When the vectors are generated, the Neural Network and SVM are also submitted to the engine. Therefore, the training of the AdaBoost, Neural Network and SVM is reduced to creation of the base dataset and the experiment configurations.

3.5. Web Access to the System

The web page depository at [UPGM 2007] is available for the presented evaluation platform containing the platform structure and datasets description along with the appropriate toolkits. The datasets in RAW format are about to be shared there, along with the feature vector text files and the test results. An option for the public is being prepared allowing anyone to upload his own results for the given tests to compare them with the built-in solutions.

4. Sample Experiments and Their Results

To demonstrate the functionality of the evaluation testbench, several tests have been performed with different data sets. Namely a set of hand-written digits (a MNIST dataset) and data sets of human faces.

The handwritten number set was used to demonstrate the comparison of invariance of the classifiers against rotation. Digits 0 and 1 are used in this small experiment, for each of the digits (and all the others as the counter-examples) 5 data sets were generated with different range of random rotations (0, 5, 20, 45, 90°). Haar wavelet features were used in these experiments (The AdaBoost was used to select the best 120 features from the total set). The training set contained 20 000 samples of interest and 20 000 counterexamples, the tests involved 10 000/5 000 samples.


Fig. 5: Example of the generated data sets (20x20 pixels each digit); columns: different specimens of digit 3, rows: different amount of rotational distortion


Digit

Classifier

Rotation/Error rate

20°

45°

90°

0

AB

0.12%

0.27%

0.55%

0.97%

1.62%

ANN

2.14%

2.40%

2.06%

3.22%

4.28%

SVM

0.09%

0.45%

2.78%

6.21%

25.99%

1

AB

0.14%

0.30%

0.66%

0.75%

1.18%

ANN

1.59%

1.34%

1.71%

1.87%

2.19%

SVM

0.01%

0.39%

1.82%

3.28%

5.91%

Table 1. EER of each classifier depending on the rotational distortion of the training/testing data sets


Fig. 6: EER depending on the rotational distortion of the data set – Digit 0


Fig. 7: EER depending on the rotational distortion of the data set – Digit 1


See figures 6 and 7 for a sample of the evaluation results. The graphs (and the corresponding table 1) show that even for great amount of rotation applied to the images, the classifiers remain stable. The neural network exhibits high error rate even for small amount of rotation, which can signalize overtraining or a similar flaw asking for examination.


Feature

Classifier

Feature Vector length/Error

10

20

50

100

120

Haar

AB

2.14%

2.40%

2.06%

3.22%

4.28%

ANN

8.89%

8.11%

7.43%

6.51%

6.00%

SVM

3.74%

2.66%

2.32%

2.14%


LRD4

AB

4.08%

2.81%

1.80%

1.19%

1.08%

ANN

5.36%

4.48%

3.82%

3.49%

5.80%

SVM

4.13%

3.11%

2.04%

1.45%


LRD6

AB

4.20%

3.00%

1.88%

1.27%

1.20%

ANN

4.69%

3.78%

4.04%

3.50%

3.44%

SVM

3.89%

2.81%

1.87%

1.44%


Table 2: EER depending on the length of the classifier and the used feature set


Fig. 8: EER of SVM depending on length of the feature vector for different feature sets. Logarithmic scale


Fig. 9: EER of AdaBoost depending on length of the feature vector for different feature sets. Logarithmic scale


The dataset of human faces [Šochman, Matas 2005] (10 000 faces, 20 000 non-faces, 24x24 pixels) is used to demonstrate the dependence of the classifier on the length of the feature vector. This experiment used feature vectors selected by the AdaBoost training process of various lengths (10, 20, 50, 100, and 120). Each of the lengths was tested with three different sets of features (Haar wavelet, LRD4, LRD6 – LRD=Local Rank Difference, new features being developed, 4 and 6 are parameters of the features determining their maximal size).

Figures 8 and 9, together with Table 2 show an experiment comparing the behavior of different features sets. Note in Figure 9 the SVM reacting notably worse to the commonly used Haar features than the newly developed ones.

All of the result examples were obtained from the distributed evaluation system described in this paper. Most of the work was done automatically, though some parts of the operation, namely the final evaluation required operator’s assistance. These parts of the system need future work sketched in the following section.

5. Conclusion and Future Work

The purpose of this contribution is to present a system for evaluation and comparison of image classifiers and also to propose it for public use. The original motivation of the system was both to allow research and development of the classification algorithms and the features and to help select a suitable classifier for a particular application.

An important part of the system is the dataset generation tool set, which generates new datasets based on manually annotated or obtained ones by adding noise, geometric deformations (rotation, scale, skew, …), etc. The training part of the system is capable of running the AdaBoost classifier first to select a suitable feature set for the other two employed classifiers that are run afterwards. As running the training and evaluation on many combinations of the automatically generated data sets is computationally very intensive, parallel distribution across a grid of computers is automatically used in the system.

The vital parts of the system (dataset generation, feature selection, training, computational load distribution, much of the evaluation) are already functional, mostly automatic, and routinely used. Yet some work still needs to be done for fully automatic operation, namely evaluation of the results. On the other hand, the classifiers will always need some manual tuning by an expert human so fully automatic operation is not possible and expected.

Future work should include further improvements in error measures based for example on application-specific custom error metrics, better evaluation of generalization ability of the classifiers based, and overall improvements of user usability of the system.

The presented system seems to be helpful in development of the classification algorithms and the used features. Namely the AdaBoost classification engine is in the focus of the authors and there are some promising results in development of suitable features for hardware acceleration of AdaBoost. The evaluation similar to the one discussed here is indispensable in this research.

The authors would be more than happy to include any other classifier implementation (or just classifier results on a given test) into the comparison, and offer to supply detailed information on the interface any classifier must meet to be included into the evaluation system.

6. Acknowledgements

This work has been supported by the “Centre of Computer Graphics” (CPG-LC06008), Czech Ministry of Education, Youth, and Sports, CareTaker, IST EU project number 027231, and Czech Grant Agency, project GA201/06/1821 “Image Recognition Algorithms”.

References

Bartlett, M. et al. 2005. Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2, 568—573.

Boser, B. E., Guyon, I. M., Vapnik, V. N. 1992. A training algorithm for optimal margin classifiers. In 5th Annual ACM Workshop on COLT.

Bridle, J., S. 1990. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition. In Neurocomputing: Algorithms, Architectures and Applications, F. Fogelman Soulié and J. Hérault Eds., NATO ASI Series, 227-236.

Cantoni, V., Petrosino, A. 2000. 2-D Object Recognition by Structured Neural Networks in a Pyramidal Architecture. In Proceeding of the Fifth IEEE International Workshop on Computers Architectures for Machine Perception (CAMP 00), 0-7695-0740-9/00.

CBCL at MIT. 2005. CBCL face dataset, http://cbcl.mit.edu/software-datasets/FaceData2.html.

Chang, C., Lin, C. 2001. LIBSVM: A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

Egan, J., P. 1975. Signal Detection Theory and ROC Analysis. Academic Press.

Farber, P. 1997. Quicknet on MultiSpert: Fast Parallel Neural Network Training. International Computer Science Institute, Berkeley, TR-TR-97-047.

Freund, Y., Schapire, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. In Journal of Computer and System Sciences, 55(1):119--139.

Gllavata, J., Ewerth, R., Freisleben, B. 2004. A Text Detection, Localization and Segmentation System for OCR in Images. In IEEE, Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering (ISMSE 04), 0-7695-2217-3/04.

Gosselin, P. H., Cord, M. 2004. A comparison of active classification methods for content-based image retrieval. In ACM Proceedings of the 1st international workshop on Computer vision meets databases.

Han, J., Kamber, M. 2006. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, ISBN 1558609016.

Howley, T., and Madden, M. 2005. The Genetic Kernel Support Vector Machine: Description and Evaluation. In Artificial Intelligence Review. http://www.icsi.berkeley.edu/~dpwe/projects/sprach/sprachcore.html

ISCI. 2004. The SPRACHcore software package.

Li, S. et al. 2002. FloatBoost learning for classification. In: S. Thrun S. Becker and K. Obermayer, editors, NIPS 15. MIT Press.

McCulloch, W., S., Pitts, W. 1943. A logical Calculus of the Ideas Imminent in Nervous Activity. In: Bulletin of Mathematical Biophysics, 5 115-133.

Mierswa, I. at al. 2006. YALE: Rapid Prototyping for Complex Data Mining Tasks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Minsky, M., Papert, S. 1969. Perceptrons. MIT Press, Cambridge, Mass.

Molina, L. C., Belanche, L., Nebot, A. 2002. Feature selection algorithms: a survey and experimental evaluation.. In: IEEE International Conference on Data Mining 2002.

Ney, H. 1995. On the Probabilistic interpretation of Feedforward Classification Network. In: IEEE Transactions on Pattern Analysis and Machine Inteligence, Vol. 17, NO. 2.

Rosenblatt, R., 1959. Principles of Neurodynamics. New York, Spartan Books.

Rumelhart, D., E., Hinton, G., E., Williams, R., J. 1986. Learning Internal Representations by Error Propagation. In: Parallel Distributed Processing: Exprorations in the Microstructure of Cognition. Vol. 1: Foundations. MIT Press.

Schapire, R., Singer, Y. 1999. Improved boosting algorithms using confidence-rated predictions. In: Machine Learning, 37(3):297-336.

Sejnowski, T., Rosenberg, C., R. 1986. NETtalk: A Parallel Network thats Learns to Read Aloud. Johns Hopkins Univ. Technical Report JHU/EECS-86/01.

Šochman, J., Matas, J, 2004. AdaBoost with Totally Corrective Updates for Fast Face Detection. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, p. 445.

Šochman, J., Matas, J. 2005. WaldBoost — Learning for Time Constrained Sequential Detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2

Sun Microsystems. 2005. N1 Grid Engine 6 User Guide.

Theodoridis, S., Kountroumbas, K. 2003. Pattern Recognition. Academic Press, USA. ISBN: 0-12-685875-6.

UPGM. 2007. http://www.fit.vutbr.cz/research/groups/graph/

Vapnik, V. N., Lerner, A. 1963. Pattern recognition using generalized portrait method. In: Automation and Remote Control.