我想从Jsoup进行谷歌搜索。但我得到了错误:
org.jsoup.HttpStatusException:HTTP错误提取URL。状态= 503,网址= https://ipv4.google.com/sorry/index?continue=https://www.google.com/search%253Fq%253DJ%252520%2526%252520K%252520Pumpenservice%252520Erfurt%252520Erfurt%2526num%253D1&q=EgTVmVsYGJCJucoFIhkA8aeDS70kBAG8SxBgsgwr3uhzT435x_KnMgNyY24
那么我该怎么办呢? 我试着在15到26秒之间为每次搜索做一个随机时间。 但它不起作用。
这是我的代码:
...
for(String s : haendler)
{
Thread.sleep(ThreadLocalRandom.current().nextInt(15000, 26000));
startJavaSearchGoogle(cellValueMay,street);
}
}
public void startJavaSearchGoogle(String suche, String street) throws IOException {
try {
String searchURL = GOOGLE_SEARCH_URL + "?q=" + suche + " "+street + "&num=" + 1;
System.out.println("########################################");
System.out.println(searchURL);
System.setProperty("http.proxyHost", "127.0.0.1");
System.setProperty("http.proxyPort", "8080");
Document doc = Jsoup.connect(searchURL)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html")
.timeout(5000).get();
Elements results = doc.select("h3.r > a");
for (Element result : results) {
String linkHref = result.attr("href");
String linkText = result.text();
System.out.println("Text::" + linkText + ", URL::"
+ linkHref);
new Screenshot().make(linkHref);
}
System.out.println("########################################");
} catch (Exception e) {
e.printStackTrace();
}
}
修改
错误代码中的URL在浏览器中显示以下内容