25.05.675
000 - General Works
Karya Ilmiah - Thesis (S2) - Reference
Artificial Intelegence
61 kali
The development of open-vocabulary object detection (OVOD) has enabled object recognition from free-form textual prompts. However, most existing OVOD systems, including Grounding DINO, remain constrained to English-language datasets and encoders, limiting their applicability in non-English contexts. This research explores the adaptation of Grounding DINO by replacing its default BERT text encoder with IndoBERT, a monolingual language model trained on Indonesian corpora. Using manually translated subset of the COCO val2017 dataset, this study evaluates the performance of three model configurations: (1) English captions with BERT, (2) Indonesian captions with BERT, and (3) Indonesian captions with IndoBERT. The IndoBERT-enhanced model achieved a precision of 0.758, F1-score of 0.21, and [email protected]:0.9 of 0.132, outperforming the baselines in aligning Indonesian prompts with visual objects. These findings support the feasibility of developing vision-language models tailored to low-resource languages, emphasizing the role of monolingual encoders in cross-modal alignment.
Tersedia 1 dari total 1 Koleksi
Nama | DIVA ANINDITHA |
Jenis | Perorangan |
Penyunting | Suryo Adhi Wibowo, Koredianto Usman |
Penerjemah |
Nama | Universitas Telkom, S2 Teknik Elektro |
Kota | Bandung |
Tahun | 2025 |
Harga sewa | IDR 0,00 |
Denda harian | IDR 0,00 |
Jenis | Non-Sirkulasi |