Tourism is a rapidly growing sector that has a significant impact on increasing a country’s national income. Indonesia’s GDP is expected to grow in the coming years, and tourism is a major contributor to this growth. To address the high demand for tourism, we propose a personalized tourism route recommendation system that can assist tourists in planning their itineraries. This problem can be modelled as a Traveling Salesman Problem, which can be approached using Markov Decision Processes and reinforcement learning. In this paper, we proposed a method for generating N-days tourism routes in the Special Region of Yogyakarta that involves using Q-learning to recommend routes. We have included time constraints in our approach to fit the tour into a specific time frame and adhere to the operating hours of tourist attractions. Additionally, our method uses the Multi-Attribute Utility Theory to consider various attributes, such as rating, travel time, and cost, as cost functions to satisfy the user’s custom desired needs and preferences. The proposed method was compared to the Firefly algorithm in multiple experiments to assess its performance and determine its optimality. The experiment results showed that the proposed method is 42.89% more optimal for generating the tour than the Firefly algorithm.