Question

我想在网页的HTML中找到一段文字，尽可能快，我觉得我的程序最差，但你有什么提示吗？

我的代码是这样的：

public static void main(String[] args) throws Exception 
{
    URL url = new URL("http://stackoverflow.com/");
    BufferedReader in = new BufferedReader(
    new InputStreamReader(url.openStream()));

    String isPresent = "img";
    boolean on = false;

    String inputLine;
    while ((inputLine = in.readLine()) != null) 
    { 
         if(inputLine.contains(isPresent)) on = true;   //This takes a lot!!
    } 
 }

由于网页上有很多HTML代码，而且由于我对HTML的经验很少，因此if(inputLine.contains(isPresent))行需要花费很多秒才能执行。您是否认为在时间方面有更有效的方法来改进？谢谢。

Answer 1

一旦打开设置为true

，您就可以退出循环

要做到这一点，请改变你的状态

while ((inputLine = in.readLine()) != null && !on)

Answer 2

如果解析你的意思是尝试Jsoup。这样你就可以检查任何标签，出现次数等等。失去了可能性。

Document doc = Jsoup.connect("http://stackoverflow.com/").get();
boolean on = false;
if(doc.select("img").size() > 0){
    on = true;
}

Answer 3

您可以使用解析XML和HTML文档的java库，例如JSoup或HtmlUnit。在将JSoup二进制文件添加到类路径后，尝试下面的代码。

Document doc = Jsoup.connect("http://stackoverflow.com/").get();
String docContent=doc.text();
if(docContent.contains("searchedText"))
     on = true;

在HTML代码中有效地查找Java中的字符串

3 个答案: