我正在使用pyparsing来摄取gEDA原理图/符号文件格式。大多数是直截了当的,但我不确定如何匹配初始行上的整数字段指定的多个跟随行。
文本对象的格式如下:
(other objects)
T x y color size vis snv angle align num_lines
Text line one
Line two of the text
Finally, the 'num_lines'th line
(other objects)
num_lines 一个整数。此样式也用于其他几种类型。
作为一种解决方法,我将这些行定义为匹配有效对象类型的任何不。从技术上讲,文本对象中允许使用类似对象的行
text_meta = Type("T") + coord + color + size + visibility + show_name_value \
+ angle + alignment + num_lines + EOL
text_data_line = ~obj + LineStart() + SkipTo(LineEnd()) + EOL
text_data = Group(OneOrMore(text_data_line)).setResultsName('text')
text_data = text_data.setParseAction(lambda t: '\n'.join(t[0]))
text = text_meta + text_data
即时生成匹配规则,如:
def genLineMatcher(n):
return (LineStart() + Skipto(LineEnd()) + EOL)*n
在桌面上,但我不确定如何指定规则。
答案 0 :(得分:0)
即时生成匹配规则......
你实际上是在正确的轨道上。动态创建规则的方式是将变量长度表达式定义为Forward(),然后在解析计数字段时插入实际规则的解析操作。
幸运的是,在辅助方法countedArray
中已经实现了pyparsing。如果您将表达式更改为:
text_meta = (Type("T") + coord + color + size + visibility + show_name_value +
angle + alignment + countedArray(EOL + restOfLine)("lines"))
我认为这会做你想要的。然后,您可以使用“lines”结果名称检索行数组。
答案 1 :(得分:0)
pyparsing辅助函数'countingArray(expr)'几乎是所需要的。解析器定义和修改的辅助函数:
def numLinesList(expr, name=None):
"""Helper to snarf an end-of-line integer and match 'expr' N times after.
Almost exactly like pyparsing.countedArray.
Matches patterns of the form::
... num_lines
line one
line two
num_lines'th line
"""
arrayExpr = Forward()
def numLinesAction(s, l, t):
n = int(t[0])
arrayExpr << (n and Group(And([expr]*(n+1))) or Group(empty))
return []
matcher = Word(nums).setParseAction(numLinesAction, callDuringTry=True) \
+ arrayExpr
# remove first empty string
matcher.addParseAction(lambda t: [t[0][1:]])
if name:
matcher = matcher.setResultsName(name)
return matcher
text_meta = Type("T") + coord + color + size + visibility + show_name_value \
+ angle + alignment
text_data_line = SkipTo(LineEnd()) + EOL
text_data = numLinesList(text_data_line, 'text')
text = text_meta + text_data
输入摘录:
...
T 41600 47800 9 10 1 0 0 0 2
This is line 1
line 2 is here...
T 41600 47000 9 10 1 0 0 0 2
Another first line
second line foo
输出:
['T', 41600, 47800, '9', 10, True, '0', 0, 0, ['This is line 1', 'line 2 is here...']]
['T', 41600, 47000, '9', 10, True, '0', 0, 0, ['Another first line', 'second line foo']]