Design and Research of Distributed Web Crawler Based on Knowledge Graph
As the rapid growth of Internet, the related services and information are also growing rapidly. While this information is widely used by people, people have higher and higher requirements for information, and the web crawler, which is specially responsible for the collection of Internet information, is also facing great challenges. At present, large Internet companies at home and abroad and relevant research institutions have given some relatively mature solutions, but most of these solutions can only provide a search service for the general user that can not be customized and has been unable to meet the majority of users' benefit growth of various requirements. Considering the various needs of users, the distributed web crawler can meet the needs of users with its flexible information collection speed and scale. On the basis of analyzing the joints of the technical solutions of distributed web crawler, this paper designs a distributed web crawler structure based on knowledge map, and proposes a practical algorithm, which solves the key technologies of distributed web crawler and realizes a robust and distributed web crawler system. Finally, some tests are done on the web crawler, including the common crawler test and the web crawler test.