ABSTRAKSI: Peringkasan teks otomatis adalah aplikasi berbasis komputer yang mampu meringkas sebuah atau beberapa teks dengan menyaring informasi paling penting didalamnya. Kemampuan meringkas ini menghasilkan ringkasan yang langsung dapat dipergunakan tanpa proses editing kembali. Terdapat 2 tipe peringkasan teks yaitu abstraksi dan ekstraksi. Tipe abstraksi, meringkas teks dengan melibatkan parafrase dari teks. Sedangkan tipe ekstraksi, hanya menyalin informasi yang dianggap penting dari teks.
Pada Tugas Akhir ini, diimplementasikan sebuah metode ekstraksi yaitu Maximal Marginal Importance (MMI), dimana metode ini melakukan pembobotan nilai terhadap fitur kalimat pada teks, yang sebelumnya telah mengalami tahap preprocessing. Setelah tiap kalimat mempunyai bobot nilai, kalimat–kalimat tersebut dikelompokkan menggunakan algoritma k-means clustering berdasarkan nilai kemiripan (similarity) antar 2 kalimat. Dari tiap klaster kalimat yang didapatkan, dibentuklah sebuah /lebih pohon biner kalimat. Pembentukan pohon biner kalimat ini ditujukan untuk mengetahui pada level manakah kalimat–kalimat dari tiap klaster berada. Dimana level kalimat ini dilibatkan dalam perhitungan nilai MMI kalimat. Kalimat dengan nilai MMI tertinggi pada tiap klaster terpilih sebagai hasil ringkasan. Pada tahap akhir, dilakukan proses ordering terhadap kalimat ringkasan jika indeksnya belum terurut.
Evaluasi dilakukan menggunakan ROUGE Evaluation Toolkit. Dengan ROUGE, pengukuran akurasi hasil ringkasan (ringkasan kandidat) terhadap ringkasan referensi dilihat berdasarkan pada parameter recall, precision, dan f-measure. “Hasil evaluasi menunjukkan isi ringkasan kandidat mendekati isi ringkasan referensi berdasarkan parameter ROUGE”.Kata Kunci : peringkasan teks otomatis, ekstraksi, MMI, similarity, k-means clustering, ROUGEABSTRACT: Automatic Text Summarization is a computer based application that is able to summarize a text or some text with taking or filtering the most important information in it. Ability of the application is it can produce a summary which can be directly used without re-editing process. There are two types of text summarization: abstraction and extraction. The abstraction type can summarize text by involving paraphrase of the text. While the extraction type just copying the information that is considered important from the text.
On this final project, the automatic text summarization implemented extraction approaches using Maximal Marginal Importance (MMI) where this method do weighting values to sentence features in the text which previously the preprocessing is done. After each sentence has weight values, the sentences were grouped using k-means clustering algorithm based on the similarity value between two sentences. From each sentence cluster, create a sentence binary tree / more. Creation of a sentence binary tree is intended to find out at where level of sentences from each cluster is located. Where the level of the sentence was later included in the calculation value of MMI sentence. Sentence that has the highest MMI value on each cluster was chosen as the summary result. At the last process, order the index of summary sentences if it has not been sorted.
The evaluation performed using the ROUGE evaluation toolkit. With ROUGE, measurement of the accuracy of summary result (candidate summary) with reference summary based on parameter recall, precision and f-measure. “Result of evaluation shows that automatic text summarization create summary that is get similar with reference summary based on the value of parameter ROUGE.Keyword: automatic text summarization, extraction, MMI, similarity, k-means clustering, ROUGE