Text classification is a popular and important text mining task. Many document
collections are multi-class and some are multi-label. Both multi-class and multilabel
data collections can be dealt with by using binary classifications. A big
challenge for text classification is the noisy text data. This problem becomes
more severe in corpus with small set of training documents, moreover accompanied
by few positive documents. A set of natural language text contains a lot
of words. This results another important problem for text classification, namely,
high dimension data. Therefore we must select features. A classifier must identify
boundary between classes optimally. However, after the features are selected, the
boundary is still unclear with regard to mixed positive and negative documents.
Recently, relevance feature discovery (RFD) has been proposed as an effective
pattern mining-based feature selection and weighting model. Document weights
are significant for ranking relevant information. However, so far, an effective way
to set the decision boundary for ranking relevant information for classification has
not found. This thesis presents a promising boundary setting method for solving
this challenging issue to produce an effective text classifier, called RFD? . A
classifier combination to boost effectiveness of the RFD? model is also presented.
The experiments carried out in the study demonstrate that the proposed classifier
significantly outperforms existing, including state of the art, classifiers.