当我尝试使用Jsoup解析HTML页面时,我得到SocketTimeoutException
:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
at app.ForumCrawler.crawl(ForumCrawler.java:50)
at Main.main(Main.java:15)
我已经使用这部分代码来解析页面,因为我寻找像200,404等的响应。
String userAgent = "Mozilla/5.0 (jsoup)";
int timeout = 5 * 1000;
Document localDoc = null;
String url = "<url>";
Connection.Response response = Jsoup.connect(url).userAgent(userAgent).timeout(timeout).execute();
if(response.statusCode() == 200) {
localDoc = Jsoup.parse(response.body());
//do the stuff..
}
我遇到过,如果我们使用.get()
代替.execute()
我们可以摆脱SocketTimeoutException
问题,但如果我使用.get()
那么我就无法得到回应。
请建议我使用哪一个来摆脱SocketTimeoutException
,并在尝试解析页面时获取response
。
提前致谢。
答案 0 :(得分:0)
我遇到过,如果我们使用.get()而不是.execute(),我们可以摆脱SocketTimeoutException问题
正如您在帖子中的callstack中所看到的,get()
方法在内部调用了execute()
方法:
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)<-- execute() called
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)<-- ...from get().
at app.ForumCrawler.crawl(ForumCrawler.java:50)
所以你无法摆脱SocketTimeoutException
。但是,您可以通过处理此异常来强化代码。