Question

我想从html代码下面获取softwareVersion。

<div class="title">Current Version</div> <div class="content" itemprop="softwareVersion"> 1.1.3  </div> </div> <div class="meta-info"> <div class="title">Requires Android</div> <div class="content" itemprop="operatingSystems">     2.2 and up   </div> </div>

我使用下面的代码

String Html = GetHtml("https://play.google.com/store/apps/details?id="+ AppID)
Pattern pattern = Pattern.compile("softwareVersion\">[^<]*</dd");
Matcher matcher = pattern.matcher(Html);
matcher.find();

String GetHtml(String url1) 
    {
        String str = "";
        try 
        {
            URL url = new URL(url1);
            URLConnection spoof = url.openConnection();
            spoof.setRequestProperty("User-Agent",
                    "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)");
            BufferedReader in = new BufferedReader(new InputStreamReader(
                    spoof.getInputStream()));
            String strLine = "";
            // Loop through every line in the source
            while ((strLine = in.readLine()) != null) 
            {
                str = str + strLine;
            }
        } 
        catch (Exception e) 
        {
        }
        return str;
    }

但是匹配器总是返回false。我认为我的模式有问题可以任何人请求帮助我感谢

Answer 1

正如其他人评论的那样，我通常会使用html解析器从html中提取内容。但是，在您只是从字符串中提取一些信息的情况下，我可以看到您为什么要使用正则表达式。

你需要做的就是这样 - 你的正则表达式的问题是额外的d。此外，如果您在括号中包含您关心的位，您可以使用.group来抓取它。

import java.util.regex.*;

public class R {

  public static void main(String[] args){
     String Html = "<div class=\"title\">Current Version</div> <div class=\"content\" itemprop=\"softwareVersion\"> 1.1.3  </div> </div> <div class=\"meta-info\"> <div class=\"title\">Requires Android</div> <div class=\"content\" itemprop=\"operatingSystems\">     2.2 and up   </div> </div>";

     Pattern pattern = Pattern.compile("softwareVersion\">([^<]*)</d");
     Matcher matcher = pattern.matcher(Html);
     System.out.println(matcher.find());
     System.out.println(matcher.group(1));
  }
}

Matcher.find（）返回false android

1 个答案: