Question

我有一个正则表达式，希望它匹配html元标记内容属性并获取其内容。例如：

<meta name="description" content="Some website description.">

在这种情况下获得

Some website description.

仅此而已。在我的情况下，我使用这种模式：

private static Pattern siteMetaTagDescriptionAttributePattern = Pattern.compile("name=\"description\"(\\s*)content=\"(.*)\"");
Matcher matcher = siteMetaTagDescriptionAttributePattern.matcher(siteContentLine);
String siteDescription = "";
while(matcher.find()) {
  siteDescription = matcher.group(2);
}

直到结束，在这种情况下：

Some website description.">

我应该怎么做才能获得内容属性的内部内容，在本例中为

Some website description.

非常感谢。

Answer 1

考虑使用解析器而不是正则表达式。您可以使用例如{/ 3}}

String html = "<meta name=\"description\" content=\"Some website description.\">";

Document doc =Jsoup.parse(html);
System.out.println(doc.select("meta[name=description]").attr("content"));

输出：

Some website description.

Answer 2

如果你坚持：

(?<=name=\"description\" content=\")[^\"]*(?=\")

Java正则表达式匹配元标记内容属性值

2 个答案: