Voice navigation systems have recently become virtually ubiquitous. However, current systems are not always easy for pedestrians to utilize because users invariably have to look at the maps on their mobile devices while walking. This paper proposes a mobile voice navigation system that utilizes user-based mobile map annotations and generates navigation sentences automatically based on guide annotations inputted by users. It is difficult to realize user-friendly navigation using only voice. Therefore, the proposed system can navigate routes using both voice and photos. Further, because the view of a landmark varies according to the point from which it is observed, the proposed system realizes appropriate navigation from any location viewpoint by using the relations between nodes and landmarks. In addition, it collects guide annotations, including photos and landmarks, from users who know the locations well via smartphone. This is because inputting all guide annotation details using only a manager is tedious and time-consuming. The results of experiments conducted in which the guide annotations added by users and the content of the navigation generated were evaluated show that the system is effective.