Question

我试图让一些事情变得非常简单，但我只是吮吸正则表达式。

我的目标是取代：

<a href="http://www.google.com">Link To Google</a>

致：

<b>Link To Google</b>

在java。

中

我试过了：

String input = "<a href=\"http://www.google.com\">Link to Google</a>";
String Regex1 = "<a href(.*)>";
String Regex2 = "</a>";
String output = test.replace(Regex1, "<b>");
output = test.replace(Regex2, "</b>");

但是第一个Regex1与我的输入不匹配。任何线索？

提前致谢！

Answer 1

它匹配得很好，即使它是错的，你不应该使用正则表达式来解析HTML。

您希望在第一次替换的结果上进行第二次替换，而不是原始字符串：

String output = test.replace(Regex1, "<b>");
output = output.replace(Regex2, "</b>");

您可以使用以下方法使其适用于您的示例：

String Regex1 = "<a href.*?>";

这使量词不合适。但是，对于输入HTML中的最轻微更改，此表达式将轻松打破非常，这是（原因之一）为什么不应该使用正则表达式来处理HTML。

以上正则表达式的一些简单示例不适用于：

<A HREF="http://www.google.com">
<a  href="http://www.google.com">
<a href="http://www.google.com"
>
<a href=">">

Answer 2

使用解析器。它们易于使用，并且始终是更正确的解决方案。

jsoup（http://jsoup.org）可以像这样轻松处理你的任务：

File input = new File("your.html");
Document doc = Jsoup.parse(input, "UTF-8");

Elements links = doc.select("a[href]");

while (links.hasNext()) {
  Element link = iterator.next();
  Element bold = doc.createElement("b").appendText(link.text());
  link.replaceWith(bold);
} 

// now do something with...
// doc.outerHtml()

Answer 3

如果您希望它起作用，请用

替换Regex1

<a href=\"(.*)\">

然后：

output = output.replace(Regex2,"</b>")

Answer 4

不知道在Java中使用正则表达式，但必须有一个“捕获组”概念：

您的初始正则表达式为："<a\s+href\s*=\s*".*?">(.*?)</a>"

您将替换为："<b>$1</b>"（其中$ 1表示在第一个正则表达式中的括号之间捕获的组）

正则表达式匹配href链接并用<b> </b>替换它

4 个答案: