我试图在此处pyparsing中确定ISC样式(Bind9 / DHCP)配置解析器(在GitHub,Google等搜索了很长时间之后)。
ISC样式的配置文件具有以下古怪的文本属性:
ISC风格的配置语法(也在pyparsing中)最接近的编码风格是NGINX,我在there on GitHub上看到过。但这将意味着放弃pyparsing的自动空白处理,因为如果可能的话,我想保留它。
当我开始执行输入-模糊单元测试时,我已经制作好的PyParsing语法语法树现在摇摇欲坠。
[['server', 'example.com']]
[['server', 'example.com ']]
[['server', 'example.com ']]
[['server', 'example.com']]
[['server', 'example.com ']]
[['server', 'example.com ']]
[['server', 'example.com ']]
[['server', 'example.com ']]
[['server', 'example.com ']]
['options', ['server', 'example.com '], ['server2', 'example2.net ']]
我有语法代码段:
lbrack = Literal("{").suppress()
rbrack = Literal("}").suppress()
period = Literal(".")
semicolon = Literal(";").suppress()
domain_name = Word(srange("[0-9A-Za-z]"), min=1, max=63)
domain_name.setName("domain")
fqdn = originalTextFor(domain_name - \
originalTextFor(period - \
domain_name) * (0, 16) - \
Optional(period))
fqdn.setName("fully-qualified domain name")
orig_fqdn = originalTextFor(fqdn).setName('FQDN')
options_server = Group(Keyword("server") - fqdn - semicolon)
options_server2 = Group(Keyword("server2") - fqdn - semicolon)
options_group = Optional(options_server) & \
Optional(options_server2) \
我仍然无法摆脱尾随空白。
尝试以下操作无济于事:
iwsp = Optional(Word("[ \t]")).suppress() # Ignore WhiteSPace
options_server = Group(Keyword("server") - fqdn - iwsp - semicolon)
我在做什么错了?
下面附有完整的有效Python代码段:
#!/usr/bin/env python3
from pyparsing import Literal, Word, srange, \
originalTextFor, Optional, ParseException, \
OneOrMore, Keyword, ZeroOrMore, \
ParseSyntaxException, Group
lbrack = Literal("{").suppress()
rbrack = Literal("}").suppress()
period = Literal(".")
semicolon = Literal(";").suppress()
domain_name = Word(srange("[0-9A-Za-z]"), min=1, max=63)
domain_name.setName("domain")
fqdn = originalTextFor(domain_name - \
originalTextFor(period - \
domain_name) * (0, 16) - \
Optional(period))
fqdn.setName("fully-qualified domain name")
orig_fqdn = originalTextFor(fqdn).setName('FQDN')
options_server = Group(Keyword("server") - fqdn - semicolon)
options_server2 = Group(Keyword("server2") - fqdn - semicolon)
options_group = Optional(options_server) & \
Optional(options_server2) \
# | had a bunch of other options commented out
options_clause = Keyword("options") - \
lbrack - \
options_group - \
rbrack - \
semicolon
statement = options_clause # | had a bunch of other clauses commented out
isc_style_syntax = statement
def parse_me(parse_element, test_data):
greeting = parse_element.parseString(test_data, parseAll=True)
greeting.pprint(indent=4)
if __name__ == '__main__':
parse_me(options_server, "server example.com;")
parse_me(options_server, "server example.com ;")
parse_me(options_server, "server example.com\t;")
parse_me(options_server, "server\texample.com;")
parse_me(options_server, "server\texample.com ;")
parse_me(options_server, "server\texample.com\t;")
parse_me(options_server, "server example.com ;")
parse_me(options_server, "server\t \texample.com \t ;")
parse_me(options_server, "server\t\t\texample.com\t\t\t;")
parse_me(statement, "options { server\t \texample.com \t;\n server2\t\t\t\t\t\t\t\t\t\t\t\t example2.net\t;\n}\n ;")
答案 0 :(得分:1)
问题是:
fqdn = originalTextFor(domain_name - \
originalTextFor(period - \
domain_name) * (0, 16) - \
Optional(period))
由于存在重复和尾随的Optional位,似乎originalTextFor一直在读取和提取字符,直到在重复中实际失败为止。但是,如果将其更改为:
fqdn = Combine(domain_name - \
originalTextFor(period + \
domain_name) * (0, 16) - \
Optional(period))
然后您的fqdn
将只包含非空白字符。
ParserElement还具有自己的runTests
方法,可以更轻松地为多个输入编写快速测试:
options_server.runTests("""
server example.com;
server example.com ;
server example.com .z;
server example.com.;
""")
将打印:
server example.com;
[['server', 'example.com']]
[0]:
['server', 'example.com']
server example.com ;
[['server', 'example.com']]
[0]:
['server', 'example.com']
server example.com .z;
^(FATAL)
FAIL: Expected ";" (at char 21), (line:1, col:22)
server example.com.;
^(FATAL)
FAIL: Expected domain (at char 19), (line:1, col:20)
(您的所有制表符测试用例均未得到真正检查,因为默认情况下pyparsing会将制表符扩展为空格。您必须调用expr.parseWithTabs()
才能禁用此功能。)