答案 0 :(得分:0)
你应该使用谷歌custom search API 免费版仅限于前100个结果。
也可以使用jsoup,但是网站上的布局/ css更改打破选择器的可能性比cs API中的更改更可能,因此使用API更加稳定。此外,来自一个ip的异常搜索量可能会阻止ip(仅仅是推测),但如果您的应用程序只是通过用户的手动搜索,这应该不是问题。
jsoup方法:
String searchTerm = "jsoup examples";
int numberOfResultpages = 5;
String searchUrl = "https://www.google.com/search?q="+searchTerm.replace(" ", "+")+"&start=";
Document doc;
for (int i = 0; i < numberOfResultpages; i++) {
doc = Jsoup.connect(searchUrl+i).userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36")
.referrer("https://www.google.com/").get(); // note: not working without userAgent
for (Element result : doc.select("h3.r a"))
{
String title = result.text();
String url = result.attr("href");
// just printing out title and link to demonstate the approach
System.out.println(title + " -> " + url);
}
}
输出:
Example program: list links: jsoup Java HTML parser -> https://jsoup.org/cookbook/extracting-data/example-list-links
Extract attributes, text, and HTML from elements: jsoup Java HTML ... -> https://jsoup.org/cookbook/extracting-data/attributes-text-html
Use selector-syntax to find elements: jsoup Java HTML parser -> https://jsoup.org/cookbook/extracting-data/selector-syntax
...