我正在尝试通过使用PyParsing删除前导或尾随空白字符来清除一些代码。删除前导空格很容易,因为我可以使用匹配字符串但不包含它的FollowedBy
子类。现在我需要跟我识别的字符串一样的东西。
这是一个小例子:
from pyparsing import *
insource = """
annotation (Documentation(info="
<html>
<b>FOO</b>
</html>
"));
"""
# Working replacement:
HTMLStartref = OneOrMore(White(' \t\n')) + (FollowedBy(CaselessLiteral('<html>')))
## Not working because of non-existing "LeadBy"
# HTMLEndref = LeadBy(CaselessLiteral('</html>')) + OneOrMore(White(' \t\n')) + FollowedBy('"')
out = Suppress(HTMLStartref).transformString(insource)
out2 = Suppress(HTMLEndref).transformString(out)
输出结果为:
>>> print out
annotation (Documentation(info="<html>
<b>FOO</b>
</html>
"));
和应该获取:
>>> print out2
annotation (Documentation(info="<html>
<b>FOO</b>
</html>"));
我查看了documentation,但未找到与LeadBy
等效的“FollowedBy
”或如何实现该目标。
答案 0 :(得分:2)
你所要求的是&#34; lookbehind&#34;,也就是说,只有在特定模式之前有事物时才匹配。我现在还没有明确的课程,但是对于你想做的事情,你仍然可以从左到右转换,只留下领先部分,而不是抑制它,只是压制空白。
以下是解决问题的几种方法:
# define expressions to match leading and trailing
# html tags, and just suppress the leading or trailing whitespace
opener = White().suppress() + Literal("<html>")
closer = Literal("</html>") + White().suppress()
# define a single expression to match either opener
# or closer - have to add leaveWhitespace() call so that
# we catch the leading whitespace in opener
either = opener|closer
either.leaveWhitespace()
print either.transformString(insource)
# alternative, if you know what the tag will look like:
# match 'info=<some double quoted string>', and use a parse
# action to extract the contents within the quoted string,
# call strip() to remove leading and trailing whitespace,
# and then restore the original '"' characters (which are
# auto-stripped by the QuotedString class by default)
infovalue = QuotedString('"', multiline=True)
infovalue.setParseAction(lambda t: '"' + t[0].strip() + '"')
infoattr = "info=" + infovalue
print infoattr.transformString(insource)