Question

我想从使用JAVA（Android）的网站中提取<div class="score">4.1</div>的值。我试过Jsoup，尽管使用起来并不简单，但它在8秒内给出了值，这非常慢。您需要知道，该网站的页面源有300,000个字符，而<div>位于中间位置。

即使使用HttpClient并将来源转换为StringBuilder，然后浏览整个字符串，直到找到得分部分更快（3-4秒）。< / p>

我无法试用HtmlUnit，因为它需要大量的jar文件，过了一段时间Eclipse总是在混乱中生气。

有更快的方法吗？

Answer 1

您可以简单地发送XMLhttpRequest，然后使用search（）函数搜索响应。我认为这会更快。

类似问题：Retrieving source code using XMLhttpRequest in javascript

为了使搜索更快，你可以简单地使用indexOf（[sting to search]，[起始索引]）并指定起始索引（它不需要非常准确，你只需要减少你的搜索区域。）

Answer 2

这就是我所做的。问题是我逐行阅读网页然后将它们粘在一起StringBuilder并搜索特定部分。然后我问自己：为什么我要逐行阅读页面，然后将它们粘在一起？因此，我将页面读入ByteArray并将其转换为字符串。刮刮时间不到一秒！

try
    {
       InputStream is = new URL(url).openStream();
       outputDoc = new ByteArrayOutputStream();
       byte buf[]=new byte[1024];
       int len;
       while((len=is.read(buf))>0)
       {
          outputDoc.write(buf,0, len);
       }
       outputDoc.close();
        } catch(Exception e) {  e.printStackTrace(); }

try {
    page = new String(outputDoc.toByteArray(), "UTF-8");
        //here I used str.indexOf to find the part

}

抓一个数据的网站

2 个答案: