Java:打开带有IP的URL

时间:2013-04-12 02:48:52

标签: java url ip

我正在构建一个网络抓取工具。 Having read this我了解DNS解析速度很慢,因此我们应该将DNS解析器分开。

所以说你有 字符串urlString http://google.com 然后你可以通过

将其转换成ip
URL url = new URL(urlString)
InetAddress ip = InetAddress.getByName(url.getHost());

但是你如何下载实际的网站呢?

使用网址,我们可以点这样:

String htmlDocumentString = new Scanner(new url.openStream(), "UTF-8").useDelimiter("\\A").next();

但是如果我们想要使用已解析的IP,我们是否必须使用ip手动重建URL?没有url.setHost()方法,它看起来有点混乱?

2 个答案:

答案 0 :(得分:0)

从URL读取很简单:

public class URLReader {
public static void main(String[] args) throws Exception {

    URL oracle = new URL("http://www.oracle.com/");
    BufferedReader in = new BufferedReader(
    new InputStreamReader(oracle.openStream()));

    String inputLine;
    while ((inputLine = in.readLine()) != null)
        System.out.println(inputLine);
    in.close();
}

取自:http://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html

答案 1 :(得分:0)

请改为尝试:

  URL oracle = new URL("http://www.oracle.com/");
  URLConnection urlc = oracle.openConnection();
  urlc.setDoInput(true);
  urlc.setRequestProperty("Accept", "text/text");
  InputStream inputStream = urlc.getInputStream();
  String myString = IOUtils.toString(inputStream, "UTF-8");

...使用上面Apache Commons的IOUtils:

http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream,%20java.lang.String)