Abstrak
Deteksi objek menggunakan deep learning menghadapi tantangan signifikan pada skenario dunia nyata dengan objek teroklusi. Arsitektur YOLOv8n yang populer untuk deteksi real-time dapat ditingkatkan kinerjanya menggunakan modul perhatian (attention). Penelitian ini bertujuan mengevaluasi pengaruh lokasi penyisipan dua modul perhatian, CBAM (Convolutional Block Attention Module) dan CoordAtt (Coordinate Attention), pada kinerja YOLOv8n dalam mendeteksi objek teroklusi. Fokus evaluasi adalah pada dataset KITTI yang dimodifikasi untuk merepresentasikan tingkat oklusi berbeda pada enam kelas objek gabungan (kendaraan dan pejalan kaki). Permasalahan utama adalah bagaimana lokasi penyisipan (Neck, Backbone, atau Keduanya) dan jenis modul perhatian (CBAM vs CoordAtt) mempengaruhi kemampuan YOLOv8n mengatasi oklusi. Hasil eksperimen secara konsisten menunjukkan penyisipan modul perhatian pada bagian Neck memberikan peningkatan kinerja paling signifikan. Konfigurasi CBAM-Neck mencapai [email protected] tertinggi (0.683) dan [email protected]:0.95 (0.476), menunjukkan peningkatan substansial dari baseline (0.659 dan 0.449). Analisis kualitatif mengonfirmasi kemampuan model dengan perhatian di Neck untuk lebih baik mendeteksi objek teroklusi sedang, meskipun oklusi berat tetap menjadi tantangan. Penyisipan pada Backbone atau Both terbukti kurang efektif.
Kata kunci : YOLOv8, CBAM, CoordAtt, deteksi objek, oklusi, perhatian (attention), KITTI.
Abstract
Object detection using deep learning faces significant challenges in real-world scenarios with occluded objects. The YOLOv8n architecture, popular for real-time detection, can potentially be enhanced using attention modules. This research evaluates the effect of the insertion location of two attention modules, CBAM (Convolutional Block Attention Module) and CoordAtt (Coordinate Attention), on YOLOv8n's performance in detecting occluded objects. The evaluation focuses on a modified KITTI dataset representing varying occlusion levels across six combined object classes (vehicles and pedestrians). The core problem is how insertion location (Neck, Backbone, or Both) and attention type (CBAM vs. CoordAtt) affect YOLOv8n's ability to handle occlusion. Experimental results consistently show that inserting attention modules into the Neck section yields the most significant performance improvement. The CBAM-Neck configuration achieved the highest [email protected] (0.683) and [email protected]:0.95 (0.476), a substantial improvement over the baseline (0.659 and 0.449). Qualitative analysis confirms enhanced detection of moderately occluded objects with Neck attention, although severe occlusion remains challenging. Backbone or Both insertions were less effective.
Keywords: YOLOv8, CBAM, CoordAtt, object detection, occlusion, attention, KITTI.