ABSTRAKSI: Pelabelan data membutuhkan cost yang mahal dan besar, untuk itulah diperlukan suatu sistem dimana data dapat dilabelkan dengan mudah dan tepat. Semi-supervised clustering adalah suatu teknik learning untuk mengelompokkan atau melabelkan data unsupervised menggunakan supervised data sebagai acuannya. HMRF-KMeans merupakan algoritma semi-supervised clustering, dimana algoritma ini menggunakan Hidden Markov Random Field, untuk mengambil dan mengobservasi data secara acak dan menghitung probabilitas alaminya melalui komponen parameter hidden (tersembunyi). HMRF-KMeans menggabungkan constraint-based dan distance-based learning dalam fungsi objektif HMRF-KMeans. Fungsi objektif HMRF-KMeans yang minimum akan menghasilkan kualitas cluster yang baik. Dengan constraint based, proses inisialisasi centroid menjadi tepat dan distance learning membantu untuk meminimumkan fungsi objektif HMRF-KMeans.
Kata Kunci : cost, semi-supervised clustering, HMRF-KMeans, algoritma, supervised, unsupervised, constraint, distance.ABSTRACT: Labeling data is expensive and requires great cost. Therefore it needed a system where data can be easily and accurately labeled. Semi-supervised clustering is a learning technique to cluster or to label unsupervised data using supervised data. Supervised data is used as reference for grouping unsupervised data. HMRF-KMeans is a semi-supervised clustering algorithm, where this algorithm using hidden Markov Random Field, to take up and to observe supervised data at random and then make these data as a reference to cluster the data. HMRF-KMeans combines Constraint-based and distance-based learning in HMRF-KMeans objective function. The minimum HMRF-KMeans objective function, will produce the right cluster. Constraint-based, provides the best centroid in initialization process and distance learning helps to give minimize HMRF -KMeans objective function.
Keyword: cost, semi-supervised clustering, HMRF-KMeans, algorithm, supervised, unsupervised, constraint, distance.