我正在为IBM Rhapsody sbs
文件格式构建解析器。但不幸的是,递归部分不能按预期工作。规则pp.Word(pp.printables + " ")
可能是问题,因为它也匹配;
和{}
。但至少;
也可以成为价值观的一部分。
import pyparsing as pp
import pprint
TEST = r"""{ foo
- key = bla;
- value = 1243; 1233; 1235;
- _hans = "hammer
time";
- HaMer = 765; 786; 890;
- value = "
#pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
";
- _mText = 12.11.2015::13:20:0;
- value = "war"; "fist";
- _obacht = "fish,car,button";
- _id = gibml c0d8-4535-898f-968362779e07;
- bam = { boing
- key = bla;
}
{ boing
- key = bla;
}
}
"""
def flat(loc, toks):
if len(toks[0]) == 1:
return toks[0][0]
assignment = pp.Suppress("-") + pp.Word(pp.alphanums + "_") + pp.Suppress("=")
value = pp.OneOrMore(
pp.Group(assignment + (
pp.Group(pp.OneOrMore(
pp.QuotedString('"', escChar="\\", multiline=True) +
pp.Suppress(";"))).setParseAction(flat) |
pp.Word(pp.alphas) + pp.Suppress(";") |
pp.Word(pp.printables + " ")
))
)
expr = pp.Forward()
expr = pp.Suppress("{") + pp.Word(pp.alphas) + (
value | (assignment + expr) | expr
) + pp.Suppress("}")
expr = expr.ignore(pp.pythonStyleComment)
print TEST
pprint.pprint(expr.parseString(TEST).asList())
输出:
% python prase.py
{ foo
- key = bla;
- value = 1243; 1233; 1235;
- _hans = "hammer
time";
- HaMer = 765; 786; 890;
- value = "
#pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
";
- _mText = 12.11.2015::13:20:0;
- value = "war"; "fist";
- _obacht = "fish,car,button";
- _id = gibml c0d8-4535-898f-968362779e07;
- bam = { boing
- key = bla;
}
{ boing
- key = bla;
}
}
['foo',
['key', 'bla'],
['value', '1243; 1233; 1235;'],
['_hans', 'hammer\n time'],
['HaMer', '765; 786; 890;'],
['value', '\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n '],
['_mText', '12.11.2015::13:20:0;'],
['value', ['war', 'fist']],
['_obacht', 'fish,car,button'],
['_id', 'gibml c0d8-4535-898f-968362779e07;'],
['bam', '{ boing'],
['key', 'bla']]
答案 0 :(得分:2)
困难的问题是我命名为' value_word'的非常一般的标记,这几乎是任何字符组合,只要它不是' - ',& #39; {'或者'}'。 ' value_word'定义中的否定前瞻照顾好这个。我认为这里的一个关键问题是我能够不包含' '作为' value_word'中的有效字符,但是让pyparsing执行其默认空格跳过,可能会有一个或多个' value_word组成' attr_value& #39;
最终的踢球者(在您的测试用例中找不到,但在您链接到的示例中)是属性'赋值的这一行:
- m_pParent = ;
因此attr_value也必须允许空字符串。
from pyparsing import *
LBRACE,RBRACE,SEMI,EQ,DASH = map(Suppress,"{};=-")
ident = Word(alphas + '_', alphanums+'_').setName("ident")
guid = Group('GUID' + Combine(Word(hexnums)+('-'+Word(hexnums))*4))
qs = QuotedString('"', escChar="\\", multiline=True)
character_literal = Combine("'" + oneOf(list(printables+' ')) + "'")
value_word = ~DASH + ~LBRACE + ~RBRACE + Word(printables, excludeChars=';').setName("value_word")
value_atom = guid | qs | character_literal | value_word
object_ = Forward()
attr_value = OneOrMore(object_) | Optional(delimitedList(Group(value_atom+OneOrMore(value_atom))|value_atom, ';')) + SEMI
attr_value.setName("attr_value")
attr_defn = Group(DASH + ident("name") + EQ + Group(attr_value)("value"))
attr_defn.setName("attr_defn")
object_ <<= Group(
LBRACE + ident("type") +
Group(ZeroOrMore(attr_defn | object_))("attributes") +
RBRACE
)
object_.parseString(TEST).pprint()
对于你的测试字符串,它给出:
[['foo',
[['key', ['bla']],
['value', ['1243', '1233', '1235']],
['_hans', ['hammer\n time']],
['HaMer', ['765', '786', '890']],
['value', ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']],
['_mText', ['12.11.2015::13:20:0']],
['value', ['war', 'fist']],
['_obacht', ['fish,car,button']],
['_id', [['gibml', 'c0d8-4535-898f-968362779e07']]],
['bam', [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]]]]]
我添加了可能有助于处理这些结构的结果名称。使用object_.parseString(TEST).dump()
给出了这个输出:
[['foo', [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer\n time']], ...
[0]:
['foo', [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer\n time']], ...
- attributes: [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer...
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
[1]:
['value', ['1243', '1233', '1235']]
- name: value
- value: ['1243', '1233', '1235']
[2]:
['_hans', ['hammer\n time']]
- name: _hans
- value: ['hammer\n time']
[3]:
['HaMer', ['765', '786', '890']]
- name: HaMer
- value: ['765', '786', '890']
[4]:
['value', ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']]
- name: value
- value: ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']
[5]:
['_mText', ['12.11.2015::13:20:0']]
- name: _mText
- value: ['12.11.2015::13:20:0']
[6]:
['value', ['war', 'fist']]
- name: value
- value: ['war', 'fist']
[7]:
['_obacht', ['fish,car,button']]
- name: _obacht
- value: ['fish,car,button']
[8]:
['_id', [['gibml', 'c0d8-4535-898f-968362779e07']]]
- name: _id
- value: [['gibml', 'c0d8-4535-898f-968362779e07']]
[0]:
['gibml', 'c0d8-4535-898f-968362779e07']
[9]:
['bam', [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]]
- name: bam
- value: [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]
[0]:
['boing', [['key', ['bla']]]]
- attributes: [['key', ['bla']]]
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
- type: boing
[1]:
['boing', [['key', ['bla']]]]
- attributes: [['key', ['bla']]]
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
- type: boing
- type: foo
一旦删除了主要版本行,它也会成功解析链接的示例。