我使用commons-httpclient 3.1来读取html页面源代码。除了内容编码为gzip的页面之外,它可以正常工作。我的页面来源不完整。
对于此页面,firefox将内容编码显示为gzip。
以下是详情
响应标题:
status code: HTTP/1.1 200 OK
Date = Wed, 20 Jul 2011 11:29:38 GMT
Content-Type = text/html; charset=UTF-8
X-Powered-By = JSF/1.2
Set-Cookie = JSESSIONID=Zqq2Tm8V74L1LJdBzB5gQzwcLQFx1khXNvcnZjNFsQtYw41J7JQH!750321853; path=/; HttpOnly
Transfer-Encoding = chunked
Content- length =-1
我的阅读回复代码:
HttpClient httpclient = new HttpClient();
httpclient.getParams().setParameter("http.connection.timeout",
new Integer(50000000));
httpclient.getParams().setParameter("http.socket.timeout",
new Integer(50000000));
// Create a method instance.
GetMethod method = new GetMethod(url);
// Provide custom retry handler is necessary
method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(3, false));
BufferedReader reader = null;
// Execute the method.
int statusCode = httpclient.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: "
+ method.getStatusLine());
strHtmlContent = null;
} else {
InputStream is = method.getResponseBodyAsStream();
reader = new BufferedReader(new InputStreamReader(is,"ISO8859_8"));
String line = null;
StringBuffer sbResponseBody = new StringBuffer();
while ((line = reader.readLine()) != null) {
sbResponseBody.append(line).append("\n");
}
strHtmlContent = sbResponseBody.toString();
答案 0 :(得分:1)
升级到httpclient 4.1。它应该无缝地支持压缩。
答案 1 :(得分:1)
我刚刚在这个问题上招致了这个问题,我解决了以下问题:
URL url = new URL("http://www.megadevs.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
GZIPInputStream gzip = new GZIPInputStream(conn.getInputStream());
int value = -1;
String page = "";
while ((value = gzip.read()) != -1) {
char c = (char) value;
page += c;
}
gzip.close();
希望这有帮助。