我有这个文字,我尝试打印a1
和a2
<a href="a1" title="t1"> k1 </a>
<a href="a2" title="t2"> k2 </a>
这是我的尝试:
string html = "<a href=\"a1\" title=\"t1\"> k1 </a>";
html += "<a href=\"a2\" title=\"t2\"> k2 </a>";
//here is how I think my logic expression should work:
//<a href=" [something that is not quote, 0 or more times] " [anything] </a>
Regex regex = new Regex("<a href=\"([^\"]*)\".*</a>");
foreach (Match match in regex.Matches(html)
Console.WriteLine(match.Groups[1]);
为什么只打印a1
?我很确定我做得对。我错过了什么?
答案 0 :(得分:2)
您的正则表达式.*
占用了第二个</a>
之前的所有字符。您需要的是使用.*?
延迟消费,以便它只消耗所有字符,直到第一个</a>
:
Regex regex = new Regex("<a href=\"([^\"]*)\".*?</a>");
同时, Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms