Question

我有这个文字，我尝试打印a1和a2

<a href="a1" title="t1"> k1 </a>
<a href="a2" title="t2"> k2 </a>

这是我的尝试：

string html =  "<a href=\"a1\" title=\"t1\"> k1 </a>";
       html += "<a href=\"a2\" title=\"t2\"> k2 </a>";

 //here is how I think my logic expression should work:
 //<a href=" [something that is not quote, 0 or more times] " [anything] </a>
Regex regex = new Regex("<a href=\"([^\"]*)\".*</a>");
foreach (Match match in regex.Matches(html)
    Console.WriteLine(match.Groups[1]);

为什么只打印a1？我很确定我做得对。我错过了什么？

Answer 1

您的正则表达式.*占用了第二个</a>之前的所有字符。您需要的是使用.*?延迟消费，以便它只消耗所有字符，直到第一个</a>：

Regex regex = new Regex("<a href=\"([^\"]*)\".*?</a>");

同时， Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

使用正则表达式和正则表达式查找字符串

1 个答案: