Question

我目前正在开发一个使用Apache HttpClient 4.1.2的项目，它从网站上检索一些数据。

应用程序的作用：它进入一个网页然后转到下一个（找到的）页面直到它结束（例如：转到第1页 - ＆gt;找到20多个页面 - ＆gt;转到下一个20页页）。问题是它在检索一些随机页面时会遇到困难而且不会继续爬行。

以下是一些代码：

DefaultHttpClient mainHttp;
HttpPost post;
HttpResponse response;
HttpEntity entity;
String s;
int curPage = 1;
int index = 0;
boolean ok = true;

...

while (ok) { 
  response = mainHttp.execute(post);
  entity = response.getEntity();
  if (entity != null) {
    System.out.println("Enter " + curPage);
    s = EntityUtils.toString(entity);
    System.out.println("Exit " + curPage);
    index = s.indexOf("[" + curPage + "]");
    if (index > 0) {
      parseContent();
    } else {
      ok = false;
    }                
  }
}

在调试窗口中显示如下内容：

Enter 1
Exit 1
.
.
.
Enter n

我也在使用http请求分析器，我看到在卡住的页面上，数据未被完全检索（它没有到达</html>或页面的末尾。）

在这种情况下，我该怎么做才能跳过或重试下载数据？任何人都可以帮助我吗？

谢谢！

LE

实际设置为：

mainHttp.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(1, true));
mainHttp.getParams().setParameter("http.connection-manager.timeout", 15000);
mainHttp.getParams().setParameter("http.socket.timeout", 15000);
mainHttp.getParams().setParameter("http.connection.timeout", 15000);

其中15000是以毫秒为单位的超时。

感谢您的帮助。

Answer 1

DefaultMethodRetryHandler retryhandler = new DefaultMethodRetryHandler(1, true);
mainHttp.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, retryhandler);

来源：http://hc.apache.org/httpclient-3.x/tutorial.html（方法恢复）

但是，只有在发生任何异常的情况下，每次发出请求时都会尝试检查 IOExceptions

Java Apache HttpClient EnityUtils块

1 个答案: