Question

我正在尝试从网页上阅读并从meta获取最后修改日期。例如

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta http-equiv="last-modified" content="Mon, 17 Sep 2012 13:57:35 SGT" />
</head>

我正在逐行阅读，如何在这种情况下构建正则表达式？我对正则表达式相当新。我试过了

line.matches("<meta http-equiv=\"last-modified\" content=\"(\w)*\" /> ");

但不认为这是正确的。

Answer 1

虽然您应该never use regex to parse html，但如果您坚持，heres a regex选项

Pattern metaPattern = Pattern.compile("meta .*\"last-modified\" content="(.*)");
Matcher metaMatch = metaPattern.matcher(sampleString);
if metaMatch.matches()
{
    System.out.println(metaMatch.group(1));
}

Answer 2

您不能仅对您的论坛使用\w，因为您的目标信息包含非字符。

尝试类似：

String line = "<meta http-equiv=\"last-modified\" content=\"Mon, 17 Sep 2012 13:57:35 SGT\" />";

Pattern p = Pattern.compile("<meta .*last-modified.*content=\"(.*)\".*");
Matcher m = p.matcher(line);
if (m.matches())
    System.out.println(m.group(1));

输出：

Mon, 17 Sep 2012 13:57:35 SGT

Answer 3

这是一个没有正则表达式的解决方案。

当然，你必须小心使用它并事先做一些检查。

String data = "<head>" +  
              "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\">" +
              "<meta http-equiv=\"last-modified\" content=\"Mon, 17 Sep 2012 13:57:35 SGT\" />" + 
              "</head>";

String key =  "<meta http-equiv=\"last-modified\" content=\"";

int from = data.lastIndexOf(key);
String tag = data.substring(from + key.length());
int to = tag.indexOf("\"");
String date = tag.substring(0, to);
System.out.println(date);

正则表达式来查找元值

3 个答案: