Analisis Penerapan Algoritma Cover Coefficient-Based Incremental Clustering Methodology (C2ICM) Dalam Pengelompokkan Dokumen Teks

Harista Myke Berlina

Analisis Penerapan Algoritma Cover Coefficient-Based Incremental Clustering Methodology (C2ICM) Dalam Pengelompokkan Dokumen Teks

Harista Myke Berlina

Informasi Dasar

Analisis Penerapan Algoritma Cover Coefficient-Based Incremental Clustering Methodology (C2ICM) Dalam Pengelompokkan Dokumen Teks

Dilihat

368 kali

No. Katalog

113061017

Klasifikasi

005.1

Jenis katalog

Karya Ilmiah - Skripsi (S1) - Reference

Abstraksi

ABSTRAKSI: Dalam pembuatan tugas akhir ini digunakan metode Cover Coefficient-Based Incremental Clustering Methodology (C2ICM) yang dilakukan pengujian terhadap koleksi dokumen SMART, yaitu data ADI dan CISI. Koleksi dokumen tersebut dilakukan pemrosesan sedemikian sehingga terbentuklah cluster-cluster yang memiliki kualitas yang beragam. Adapun tahapan-tahapan yang dilakukan untuk membentuk cluster-cluster tersebut meliputi preprocessing, pembentukan D-Matrix, perhitungan C-Matrix, perhitungan jumlah cluster, pemilihan seed dokumen, dan pengelompokkan dokumen nonseed ke dalam seed dokumen terpilih. Ketika terdapat dokumen baru memasuki database, maka proses selanjutnya yang akan dilakukan adalah incremental maintenance. Jika seed dokumen terdahulu terpilih kembali menjadi seed, maka cluster yang telah terbentuk dari proses sebelumnya akan tetap digunakan, namun bila seed dokumen terdahulu tidak terpilih kembali, maka cluster yang telah terbentuk akan dihapus, dan akan dibentuk cluster baru sesuai dengan seed dokumen yang baru. Setelah terbentuknya cluster untuk semua query, maka akan dihitung kualitas cluster dengan menggunakan Silhouette Coefficient (SC) dan waktu eksekusi pembentukan cluster. Dari hasil percobaan, kualitas cluster yang terbentuk dari penggunaan algoritma C2ICM memiliki karakteristik/kualitas yang beragam. Sedangkan waktu eksekusi yang dihasilkan menunjukkan waktu yang linier terhadap pertambahan jumlah dokumen hitlist.Kata Kunci : incremental, cluster, seed, cover coefficient, silhouette coefficient, kualitas clusterABSTRACT: In this final project used Cover Coefficient-Based Incremental Methodology (C2ICM) to do clustering in SMART dataset, ADI data and CISI data. The processes are preprocessing, making D-Matrix, calculating C-Matrix, calculating number of cluster, choosing seed documents, and clustering nonseed documents to seed documents chosen. If there is new document arrive in database, so the next process called incremental maintenance. All documents in database will be recomputed D-Matrix, C-Matrix, number of cluster, and seed power. If the old seed document is chosen again as seed, so the old cluster builded by this seed document will be released, but if the old documents is not chosen as seed anymore, so that the old cluster will be deleted, and the new clusters will be created depend on new seed documents. After getting cluster for all query, the next step is calculate cluster quality by using Silhouette Coefficient (SC) and the clustering time in seconds. From the experiments, many kinds of quality cluster resulted. The clustering time that resulted in this testing is linier with number of hitlist documents.Keyword: incremental, cluster, seed, cover coefficient, silhouette coefficient, cluster quality