Question

我正在尝试将java正则表达式应用于以下文本以提取内容，但问题是当文本中只有一个href时它会发现内容正常，但是当有更多内容时，它会转到文末。这是正则表达式模式：

Pattern pattern = Pattern.compile("\\\"\\>(.*)\\</a\\>\\<br\\>", Pattern.DOTALL);

这是文字：

<div><b>Attachments:</b> <a href="http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/1.JPG">http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/1.JPG</a><br><a href="http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/yinYang.gif">http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/yinYang.gif</a><br><a href=""></a></div>

所以如果只有1.JPG的href，那么它找到正确的答案：

http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/1.JPG

但是当我添加yinYang.gif时，如果找到以下内容：

">http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/1.JPG</a><br><a href="http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/yinYang.gif">http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/yinYang.gif</a><br>

如何更改此选项以查找不同组中<a> ...</a>之间的所有值。

Answer 1

将您的模式更改为非贪婪的模式：

"\\\"\\>(.*?)\\</a\\>\\<br\\>"

但是，有六个警告词是合适的：不要这样做。

您实际上是在尝试使用正则表达式解析（半）结构化信息。经验告诉你，如果你遵循这条路线，你注定要失败。要么regexen证明不够强大到最终解决你的问题（想想嵌套结构），或者你将产生不可维护的代码。可能两者都有。

java正则表达式模式在一个文本中返回不同的组

1 个答案: