Semantic Argument Classification is the process of analyzing the sentence to investigate the pattern of WHO did WHAT to WHOM, WHEN, WHERE, WHY, HOW, from a structure text. Research on the classification of semantic arguments requires semantically labeled data in large numbers, called corpus. Because building a corpus is costly and time-consuming, recently many studies have used existing corpus as the training data to conduct semantic argument classification research on new domains without the need to build a new corpus for those new domains.
This study carries on semantic argument classification on a new domain that is Quran English Translation by utilizing Propbank corpus as the training data. Previous studies have proven that there is a significant decrease in performance when classifying semantic arguments on different domain between the training and the testing data. The main problem is when there is a new argument that found in the testing data but it is not found in the training data. To recognize the new argument in the training data, extending the argument features in the training data to accommodate the new features of the new argument becomes one of the solutions. By using SVM Linear, the experiment has proven that augmenting the proposed features to the baseline system with some combinations option improve the performance of semantic argument classification on Quran data using Propbank Corpus as training data. When tested on auto labeled data, the augmentation of PTO+SP features to the baseline system improve the accuracy by 1.25% and F-1 score by 1.30%. When tested on hand-labeled data, the augmentation of combination PO+PTO features to the baseline system improve the accuracy by 0.47% and F-1 score by 0.40%.