Question

所以我使用Apache Commons HTTP向网页发出请求。我不能为我的生活弄清楚如何从页面获取实际内容，我可以得到它的标题信息。如何从中获取实际内容？

这是我的示例代码：

HttpGet request = new HttpGet("http://URL_HERE/");

HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(request);

System.out.println("Response: " + response.toString());

谢谢！

Answer 1

BalusC的评论会很好。如果您使用的是版本4或更高版本的Apache HttpComponents，您也可以使用一种便捷方法： EntityUtils.toString(HttpEntity);

以下是代码中的内容：

HttpGet request = new HttpGet("http://URL_HERE/");
HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String entityContents = EntityUtils.toString(entity);

我希望这对你有所帮助。

不确定这是否是由于不同的版本，但我不得不重写它：

HttpGet request = new HttpGet("http://URL_HERE/");
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String entityContents = EntityUtils.toString(entity);

Answer 2

使用HttpResponse#getEntity()然后HttpEntity#getContent()将其作为InputStream获取。

InputStream input = response.getEntity().getContent();
// Read it the usual way.

请注意，HttpClient不属于Apache Commons。这是Apache HttpComponents的一部分。

Answer 3

response.getEntity();

你真的想看看Javadocs，HttpClient的例子告诉你如何获取响应中的所有信息：http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/index.html

Answer 4

如果您只想要URL的内容，可以使用URL API，如下所示：

import java.io.IOException;
import java.net.URL;
import java.util.Scanner;

public class URLTest {
    public static void main(String[] args) throws IOException {
        URL url = new URL("http://www.google.com.br");
        //here you have the input stream, so you can do whatever you want with it!
        Scanner in = new Scanner(url.openStream());
        in.nextLine();
    }
}

从Apache Commons HTTP Request获取页面内容

4 个答案: