One of the unavoidable bottlenecks in the public application of passive signal (e.g., received signal strength, magnetic) fingerprinting-based indoor localization technologies is the extensive human effort that is required to construct and update database for indoor positioning. In this paper, we propose an accurate visual-inertial integrated geo-tagging method that can be used to collect fingerprints and construct the radio map by exploiting the crowdsourced trajectory of smartphone users. By integrating multisource information from the smartphone sensors (e.g., camera, accelerometer, and gyroscope), this system can accurately reconstruct the geometry of trajectories. An algorithm is proposed to estimate the spatial location of trajectories in the reference coordinate system and construct the radio map and geo-tagged image database for indoor positioning. With the help of several initial reference points, this algorithm can be implemented in an unknown indoor environment without any prior knowledge of the floorplan or the initial location of crowdsourced trajectories. The experimental results show that the average calibration error of the fingerprints is 0.67 m. A weighted k-nearest neighbor method (without any optimization) and the image matching method are used to evaluate the performance of constructed multisource database. The average localization error of received signal strength (RSS) based indoor positioning and image based positioning are 3.2 m and 1.2 m, respectively, showing that the quality of the constructed indoor radio map is at the same level as those that were constructed by site surveying. Compared with the traditional site survey based positioning cost, this system can greatly reduce the human labor cost, with the least external information.