Question

我正在尝试使用包含Escaped字符串的数据解析一些xml标记一些样本

other tags with our without newlines
<tag name="abc1" type="bcd" value="test"><tag name="abc2" type="bcd" value="test">  
other tags other tags with our without newlines
<tag name="abc2" type="bcd" value="<w:test xmlns:wst=&quot;http://schemas.xmlsoap.org/ws/2005/02/trust&quot;><a xmlns:&quot;a:b:c:ddd:&quot;>XEduAjr8MoV</a></w:test>">

基本上我需要在其他字符串中的标签中查找值。像这样的东西

<tag name="wwww" type="wwww" value="SOME HTML ESCAPED STRING WITH NEWLINES">

这就是我所拥有的

<tag name="(?<name>\w*)" type="(?<id>\w*)" value="(?<value>.*)">

我正在使用这个c＃代码

var regex = new Regex(regstr, RegexOptions.Multiline);
MatchCollection mc = regex.Matches(sourcestring);

我遇到了多个匹配的问题，因为(?<value>.*)如果两者都是同一行<tag name="abc1" type="bcd" value="test"><tag name="abc2" type="bcd" value="test">有什么办法解决这个问题？还有更好的办法吗？

Answer 1

不建议使用正则表达式模式解析xml文件。原因是因为xml涉及/需要深度嵌套。

Answer 2

众所周知，你不应该使用正则表达式来解析xhtml，除非你没有复杂的标签和一组奇怪的字符。

但是，如果你想使用正则表达式，对于你的具体例子，你必须使用非贪婪（或懒惰）量词：

<tag name="(?<name>\w*?)" type="(?<id>\w*?)" value="(?<value>.*?)">
                                                       HERE ---^
also I put it here ---^------------------^ 
since it is more secure, but it is not needed

<强> Working demo

正则表达式匹配标签内的标签和最后匹配的标签

2 个答案: