Question

我正在编写一个正则表达式，用HTML文档中的">(Some Text)</A>"（不区分大小写）替换所有出现的子串.html">(Some Text)</A>"。

但是，它似乎不会在输出的页面上产生预期的替换。

Pattern fixRest = Pattern.compile("(\">.*?</a>)", Pattern.CASE_INSENSITIVE);
Matcher mh2 = fixRest.matcher(pgText);
mh2.replaceAll(".html$1");

当我查看输出的页面时，此代码中显示有大量href个链接没有后缀.html。

我的正则表达式有问题吗？在RegexBuddy下运行它我看到它产生了我期望变量pgText中的同一页面的结果。

Answer 1

mh2.replaceAll(".html$1");

未修改mh2。尝试使用

中的结果

mh2 = mh2.replaceAll(".html$1");

一般情况下，不要使用正则表达式来解析HTML。

以下是对失败方式的一些示例：

<a href='...'>foo</a>                  <!-- single quotes -->
<a href=...>foo</a>                    <!-- no quotes -->
<a href="..." title="">foo</a>         <!-- the href isn't the last attribute. -->
<a href="..."><img src="...">foo</a>   <!-- tag inside link -->
<a href="..." >foo</a>                 <!-- space between attribute and end -->
<a href="...">"y">"x"</a>              <!-- text node contains '>' -->

我相信你能想到更多。

我的正则表达式替换Java中HREF链接的内容有什么问题？

1 个答案: