Question

我需要从特定行读取URL html页面中的行。

目前，我有以下代码：

 u = new URL("http://s.ll/message/" + counter);

 is = u.openStream(); // throws an IOException

 dis = new DataInputStream(new BufferedInputStream(is));

 while ((s = dis.readLine()) != null) {
   if (s.contains('%')
      ...
 }

我知道这个内容不会在第50行之前。

我怎么能从这一行读到？

这是读取网址的最快捷方式吗？

Answer 1

我怎么能从这一行读到？

当计数低于50时，对行进行计数并忽略该行。除了读取流并计算行数之外，没有神奇的方法直接进入第50行。无论如何都必须读取流。

这是读取网址的最快捷方式吗？

取决于。但是，更常见的方法是BufferedReader + InputStreamReader，其中您指定网页编码的字符集以避免mojibake。

Answer 2

你走在正确的轨道上。要从URL读取数据，最简单的方法是使用URL对象。对于更复杂的HTTP通信任务，您可以考虑HTTPClient。

您不使用 DataInputStream.readLine（）的方法，因为您无法提供从字节转换为字符串时使用的字符集。

我会这样做：

 u = new URL("http://s.ll/message/" + counter);

 is = u.openStream(); // throws an IOException

 // XXX notice the charset set to utf-8 here.
 BufferedReader reader = new BufferedReader(new InputStreamReader(is, "utf-8"));

 while ((s = reader.readLine()) != null) {
   if (s.contains('%')
      ...
 }

查找第50行需要您跳到它。由于你无法知道流入哪个字节的第50个'\ n'（或'\ r'或'\ r \ n'取决于Unix，Mac或Windows换行符）是 - 你只需要从一开始。

读取url数据特定行java

2 个答案: