我在尝试调试pyparsing代码时遇到了这种意外行为:
string1 = "this is a test string : that behaves as I expect\n"
string2 = "this string does not behave as I expect\n"
field = CharsNotIn(":\n")
line = field + ZeroOrMore(Literal(":") + field) + LineEnd()
print line.parseString(string1)
print line.parseString(string2)
这会产生以下输出:
['this is a test string ', ':', ' that behaves as I expect', '\n']
['this string does not behave as I expect']
由于某种原因,解析器能够在string1
中获取行尾字符,但它无法在string2
中找到它。我甚至无法理解如果string2
没有接收到行尾,它是如何产生匹配的。
这种行为似乎特别适用于行尾字符,因为使用除行尾之外的字符似乎工作正常:
string1 = "this is a test string : that behaves as I expect*"
string2 = "this string also behaves as I expect*"
field = CharsNotIn(":*")
line = field + ZeroOrMore(Literal(":") + field) + Literal("*")
print line.parseString(string1)
print line.parseString(string2)
这会产生:
['this is a test string ', ':', ' that behaves as I expect', '*']
['this string also behaves as I expect', '*']
答案 0 :(得分:1)
打印行以查看匹配的伪正则表达式。
>>> print line
{!W:(:
) [{":" !W:(:
)}]... LineEnd}
如果我理解这一点,它正在寻找非冒号非换行符,它停在第一个换行符(在你的示例中为string2,占据整行),然后查找冒号和更多单词(如果它们存在)(他们没有),然后换行。我的猜测是新行实例在某种程度上被删除了,而不是你断言如果它与新行不匹配就不匹配字符串是假的。
>>> print line.parseString('xyzyy')
['xyzyy']
这确实留下了为什么即使没有换行符也要匹配的问题......