如何摆脱尾随空格

时间:2019-07-26 18:34:08

标签: pyparsing

我试图在此处pyparsing中确定ISC样式(Bind9 / DHCP)配置解析器(在GitHub,Google等搜索了很长时间之后)。

ISC样式的配置文件具有以下古怪的文本属性:

  • 所有C / C ++ / Bash注释样式
  • 包括文件支持
  • 分号在关键字之前终止
  • 分号可能会或可能不会直接位于令牌模式旁边
  • 多行支持(分号可能会在以后添加几行)

ISC风格的配置语法(也在pyparsing中)最接近的编码风格是NGINX,我在there on GitHub上看到过。但这将意味着放弃pyparsing的自动空白处理,因为如果可能的话,我想保留它。

当我开始执行输入-模糊单元测试时,我已经制作好的PyParsing语法语法树现在摇摇欲坠。

[['server', 'example.com']]
[['server', 'example.com ']]
[['server', 'example.com      ']]
[['server', 'example.com']]
[['server', 'example.com ']]
[['server', 'example.com     ']]
[['server', 'example.com    ']]
[['server', 'example.com      ']]
[['server', 'example.com                     ']]
['options', ['server', 'example.com     '], ['server2', 'example2.net   ']]

我有语法代码段:

lbrack = Literal("{").suppress()
rbrack = Literal("}").suppress()
period = Literal(".")
semicolon = Literal(";").suppress()

domain_name = Word(srange("[0-9A-Za-z]"), min=1, max=63)
domain_name.setName("domain")
fqdn = originalTextFor(domain_name - \
                       originalTextFor(period - \
                                       domain_name) * (0, 16) - \
                       Optional(period))
fqdn.setName("fully-qualified domain name")
orig_fqdn = originalTextFor(fqdn).setName('FQDN')
options_server = Group(Keyword("server") - fqdn - semicolon)
options_server2 = Group(Keyword("server2") - fqdn - semicolon)
options_group = Optional(options_server) & \
                      Optional(options_server2) \

我仍然无法摆脱尾随空白。

尝试以下操作无济于事:

iwsp = Optional(Word("[ \t]")).suppress() # Ignore WhiteSPace
options_server = Group(Keyword("server") - fqdn - iwsp - semicolon)

我在做什么错了?

下面附有完整的有效Python代码段:

#!/usr/bin/env python3

from pyparsing import Literal, Word, srange, \
    originalTextFor, Optional, ParseException, \
    OneOrMore, Keyword, ZeroOrMore, \
    ParseSyntaxException, Group

lbrack = Literal("{").suppress()
rbrack = Literal("}").suppress()
period = Literal(".")
semicolon = Literal(";").suppress()

domain_name = Word(srange("[0-9A-Za-z]"), min=1, max=63)
domain_name.setName("domain")
fqdn = originalTextFor(domain_name - \
                       originalTextFor(period - \
                                       domain_name) * (0, 16) - \
                       Optional(period))
fqdn.setName("fully-qualified domain name")
orig_fqdn = originalTextFor(fqdn).setName('FQDN')
options_server = Group(Keyword("server") - fqdn - semicolon)
options_server2 = Group(Keyword("server2") - fqdn - semicolon)
options_group = Optional(options_server) & \
                      Optional(options_server2) \
                      # | had a bunch of other options commented out
options_clause = Keyword("options") - \
                     lbrack - \
                     options_group - \
                     rbrack - \
                     semicolon
statement = options_clause # | had a bunch of other clauses commented out
isc_style_syntax = statement


def parse_me(parse_element, test_data):

    greeting = parse_element.parseString(test_data, parseAll=True)
    greeting.pprint(indent=4)


if __name__ == '__main__':
    parse_me(options_server, "server example.com;")
    parse_me(options_server, "server example.com ;")
    parse_me(options_server, "server example.com\t;")
    parse_me(options_server, "server\texample.com;")
    parse_me(options_server, "server\texample.com ;")
    parse_me(options_server, "server\texample.com\t;")
    parse_me(options_server, "server     example.com    ;")
    parse_me(options_server, "server\t \texample.com \t ;")
    parse_me(options_server, "server\t\t\texample.com\t\t\t;")
    parse_me(statement, "options { server\t \texample.com \t;\n server2\t\t\t\t\t\t\t\t\t\t\t\t example2.net\t;\n}\n ;") 

1 个答案:

答案 0 :(得分:1)

问题是:

fqdn = originalTextFor(domain_name - \
                   originalTextFor(period - \
                                   domain_name) * (0, 16) - \
                   Optional(period))

由于存在重复和尾随的Optional位,似乎originalTextFor一直在读取和提取字符,直到在重复中实际失败为止。但是,如果将其更改为:

fqdn = Combine(domain_name - \
                   originalTextFor(period + \
                                   domain_name) * (0, 16) - \
                   Optional(period))

然后您的fqdn将只包含非空白字符。

ParserElement还具有自己的runTests方法,可以更轻松地为多个输入编写快速测试:

options_server.runTests("""
    server example.com;
    server example.com   ;
    server example.com   .z;
    server example.com.;
""")

将打印:

server example.com;
[['server', 'example.com']]
[0]:
  ['server', 'example.com']


server example.com   ;
[['server', 'example.com']]
[0]:
  ['server', 'example.com']


server example.com   .z;
                     ^(FATAL)
FAIL: Expected ";" (at char 21), (line:1, col:22)


server example.com.;
                   ^(FATAL)
FAIL: Expected domain (at char 19), (line:1, col:20)

(您的所有制表符测试用例均未得到真正检查,因为默认情况下pyparsing会将制表符扩展为空格。您必须调用expr.parseWithTabs()才能禁用此功能。)