应用错误收集

这个RegEx有什么问题吗？

时间：2010-08-01 15:56:10

标签： c# regex

我在维基百科文章的XML转储上使用RegEx。

正则表达式是= {{[a-zA-Z0-9_\(\)\|\?\s\-\,\/\=\[\]\:.]+}}

我想检测用{{和}}包裹的所有文字。但是，我没有检测到56个与{{的简单搜索相匹配的匹配，而是仅检测到45。

它未检测到的样本块是{{cite journal | last = Heeks | first = Richard | year = 2008 | title = Meet Marty Cooper - the inventor of the mobile phone | journal = BBC | volume = 41 | issue = 6 | url = http://news.bbc.co.uk/2/hi/programmes/click_online/8639590.stm | pages = 26–33 | doi = 10.1109/MC.2008.192 }} ..

但它检测到{{cite web | title = Of Cigarettes and Cellphones | last = Ulyseas | first = Mark | date = 2008-01-18 | url = http://www.thebalitimes.com/2008/01/18/of-cigarettes-and-cellphones/ | publisher = The Bali Times | accessdate = 2008-02-24 }}

任何人都可以检测到我的问题吗？

3 个答案:

答案 0 :(得分：2)

有些逃避是多余的，但我认为这不是真正的问题。

我建议尝试\w而不是a-zA-Z0-9_，尤其是因为在.NET中，正则表达式\w也识别Unicode字母（除非它符合ECMAScript标准模式）。

另一种选择是，如果文本部分不能包含}（现在它无论如何都不能），您也可以使用{{[^}]+}}。

[^...]是否定字符类。 [^}]匹配}以外的任何内容。

参考

regular-expressions.info/Character Class

这个RegEx有什么问题吗？

3 个答案:

参考

相关问题