Java - JSoup - HTTP错误提取URL。状态= 400

时间:2017-09-01 09:03:28

标签: java jsoup screen-scraping http-status-code-400

在使用不同查询从duckduckgo.com获取结果时,经过20-30次迭代后,我得到了这个例外:

Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=400, URL=https://duckduckgo.com/html/?q=  Hermann_William_Goering
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:682)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:629)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:261)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:250)
at WebContextExtractor.DDGresultsScraping(WebContextExtractor.java:378)
at WebContextExtractor.main(WebContextExtractor.java:521)

我不知道问题是什么,如果我尝试在Google搜索上手动访问该链接,我可以毫无问题地达到该链接。

当我尝试使用以下简单代码从页面获取文档时发生错误:

Connection conn = Jsoup.connect(DUCKDUCKGO_SEARCH_URL + query)
            .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                    + "(KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"); 

Document doc = conn.get(); <------ here exception

0 个答案:

没有答案