Question

我想知道如何从以下代码中提取'4151'：

</th><td><a class="external exitstitial" rel="nofollow" href="http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151">Look up price</a>

我想使用正则表达式，但如果有更好的方式我可以使用它！

Answer 1

以下内容适用于我，假设已提取href属性值：

String href = "http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151";
Pattern p = Pattern.compile("\\?obj=(\\d+)");
Matcher m = p.matcher(href);
if (m.find()) {
    System.out.println(m.group(1));
}

输出“4151”

Answer 2

以下是一些解析器库：htmlparser，jsoup和jtidy。

在你的情况下，正则表达式可能没问题，但这是you should avoid regex for html parsing为什么的经典帖子。

Answer 3

这个正则表达式可以得到数字 -

Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    ResultString = regexMatcher.group();
}

此代码未经过测试，并假设您的HTML字符串已分配给“subjectString”变量。

使用正则表达式获取HTML标记内的信息

3 个答案: