从Apache Commons HTTP Request获取页面内容

时间:2011-03-09 01:00:12

标签: java http apache-commons

所以我使用Apache Commons HTTP向网页发出请求。我不能为我的生活弄清楚如何从页面获取实际内容,我可以得到它的标题信息。如何从中获取实际内容?

这是我的示例代码:

HttpGet request = new HttpGet("http://URL_HERE/");

HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(request);

System.out.println("Response: " + response.toString());

谢谢!

4 个答案:

答案 0 :(得分:15)

BalusC的评论会很好。 如果您使用的是版本4或更高版本的Apache HttpComponents,您也可以使用一种便捷方法: EntityUtils.toString(HttpEntity);

以下是代码中的内容:

HttpGet request = new HttpGet("http://URL_HERE/");
HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String entityContents = EntityUtils.toString(entity);

我希望这对你有所帮助。

不确定这是否是由于不同的版本,但我不得不重写它:

HttpGet request = new HttpGet("http://URL_HERE/");
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String entityContents = EntityUtils.toString(entity);

答案 1 :(得分:11)

使用HttpResponse#getEntity()然后HttpEntity#getContent()将其作为InputStream获取。

InputStream input = response.getEntity().getContent();
// Read it the usual way.

请注意,HttpClient不属于Apache Commons。这是Apache HttpComponents的一部分。

答案 2 :(得分:1)

response.getEntity();

你真的想看看Javadocs,HttpClient的例子告诉你如何获取响应中的所有信息:http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/index.html

答案 3 :(得分:1)

如果您只想要URL的内容,可以使用URL API,如下所示:

import java.io.IOException;
import java.net.URL;
import java.util.Scanner;

public class URLTest {
    public static void main(String[] args) throws IOException {
        URL url = new URL("http://www.google.com.br");
        //here you have the input stream, so you can do whatever you want with it!
        Scanner in = new Scanner(url.openStream());
        in.nextLine();
    }
}