Question

美好的一天，

使用正则表达式获取标记内的所有内容是否有其他选择。这是我的代码：

   MatchCollection matches = Regex.Matches(chek, "<bib-parsed>([^\000]*?)</bib-parsed>");

这是示例输入：

   <bib-parsed>
   <cite>
   <pubinfo>
   <pub-year><i>1984</i></pub-year>
   <pub-place>Albuquerque</pub-place>
   <pub-name>Maxwell Museum of Anthropology and the University of New Mexico Press        </pub-name>
   </pubinfo>
   <bkinfo>
   <btl>The Galaz Ruin: A Prehistoric Mimbres Village in Southwestern New Mexico</btl>
   </bkinfo>
   </bib-parsed>

上面的示例将被匹配，但是如果在“公共年份”中有“0”，则“匹配失败。对此有任何替代方法吗？谢谢

Answer 1

您的输入似乎是有效的XML。如果是这种情况，请使用System.Xml或System.Xml.Linq中的XML解析器。他们非常快。对于包含多个块的输入字符串（如您的示例），使用System.Xml.Linq命名空间对象：

var bibChunks = XDocument.Parse(yourXmlString)
                         .Descendants("bib-parsed")
                         .Select(e => e.Value);

foreach(string chunk in bibChunks) {
    // do stuff
}

这就是它的全部内容。

使用[^ \ 000] *时，RegEx无法匹配0？

1 个答案: