Question

我正在构建一个网络抓取工具。 Having read this我了解DNS解析速度很慢，因此我们应该将DNS解析器分开。

所以说你有字符串urlString http://google.com 然后你可以通过

将其转换成ip

URL url = new URL(urlString)
InetAddress ip = InetAddress.getByName(url.getHost());

但是你如何下载实际的网站呢？

使用网址，我们可以点这样：

String htmlDocumentString = new Scanner(new url.openStream(), "UTF-8").useDelimiter("\\A").next();

但是如果我们想要使用已解析的IP，我们是否必须使用ip手动重建URL？没有url.setHost()方法，它看起来有点混乱？

Answer 1

从URL读取很简单：

public class URLReader {
public static void main(String[] args) throws Exception {

    URL oracle = new URL("http://www.oracle.com/");
    BufferedReader in = new BufferedReader(
    new InputStreamReader(oracle.openStream()));

    String inputLine;
    while ((inputLine = in.readLine()) != null)
        System.out.println(inputLine);
    in.close();
}

取自：http://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html

Answer 2

请改为尝试：

  URL oracle = new URL("http://www.oracle.com/");
  URLConnection urlc = oracle.openConnection();
  urlc.setDoInput(true);
  urlc.setRequestProperty("Accept", "text/text");
  InputStream inputStream = urlc.getInputStream();
  String myString = IOUtils.toString(inputStream, "UTF-8");

...使用上面Apache Commons的IOUtils：

http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream,%20java.lang.String)

Java：打开带有IP的URL

2 个答案: