Question

我了解到，在pyparsing中，您可以通过执行以下操作来命名元素/组/节点：

token = pyparsing.Literal("Foobar")("element_name_here")

所以，我制作了一个示例程序来测试它：

import pyparsing as pp

Prefix = pp.Word(pp.nums)("Prefix")
Name = pp.Literal("FOOBAR")("Name")
Modifier = pp.Word(pp.alphas)("Modifier")
Modifier_Group = pp.Group(pp.OneOrMore(Modifier))("Modifier_Group")
Sentence = pp.Group(pp.Optional(Prefix) + Name + Modifier_Group)("Sentence")

out = Sentence.parseString("123 FOOBAR testA testB")

然后，我尝试使用这些命名的标记来获取输出。

我试过了：

>>> print out
[['123', 'FOOBAR', ['testA', 'testB']]]

...但这并没有让我获得令牌名称。

然后我尝试了以下操作：

>>> print out.items()
[('Sentence', (['123', 'FOOBAR', (['testA', 'testB'], {'Modifier': [('testA', 0), 
('testB', 1)]})], {'Modifier_Group': [((['testA', 'testB'], {'Modifier': [('testA', 0),
('testB', 1)]}), 2)], 'Prefix': [('123', 0)], 'Name': [('FOOBAR', 1)]}))]

>>> print dict(out)

{'Sentence': (['123', 'FOOBAR', (['testA', 'testB'], {'Modifier': [('testA', 0), 
('testB', 1)]})], {'Modifier_Group': [((['testA', 'testB'], {'Modifier': [('testA', 0),
('testB', 1)]}), 2)], 'Prefix': [('123', 0)], 'Name': [('FOOBAR', 1)]})}

>>> import collections
>>> print collections.OrderedDict(out)
OrderedDict([('Sentence', (['123', 'FOOBAR', (['testA', 'testB'], {'Modifier': [
('testA', 0), ('testB', 1)]})], {'Modifier_Group': [((['testA', 'testB'], 
{'Modifier': [('testA', 0), ('testB', 1)]}), 2)], 'Prefix': [('123', 0)], 
'Name': [('FOOBAR', 1)]}))])

...但它们包含了一个特殊的词组，列表和元组混合，我无法弄清楚如何解析它们。然后，我尝试这样做：

>>> print out.asXML()
<Sentence>
  <Sentence>
    <Prefix>123</Prefix>
    <Name>FOOBAR</Name>
    <Modifier_Group>
      <Modifier>testA</Modifier>
      <Modifier>testB</Modifier>
    </Modifier_Group>
  </Sentence>
</Sentence>

...这让我完全得到了我想要的东西，除了它是XML，而不是我可以轻松操作的python数据结构。有没有办法获得这样的数据结构（无需解析XML）？

我确实找到了一个返回nested dict的解决方案，但python中的dicts是无序的（我想按顺序排列令牌），所以它不适合我。

Answer 1

Pyparsing返回已经为您提供该结构的ParseResults对象。您可以通过打印out.dump()：

来显示您的句子结构

>>> print out.dump()
[['123', 'FOOBAR', ['testA', 'testB']]]
- Sentence: ['123', 'FOOBAR', ['testA', 'testB']]
  - Modifier_Group: ['testA', 'testB']
    - Modifier: testB
  - Name: FOOBAR
  - Prefix: 123

您可以像访问词典中的键一样访问这些元素：

>>> print out.Sentence.keys()
['Modifier_Group', 'Prefix', 'Name']
>>> print out.Sentence['Prefix']
123

或作为对象的属性：

>>> print out.Sentence.Name
FOOBAR
>>> print out.Sentence.Prefix
123

在pyparsing中获得与asXML（）相当的数据结构？

1 个答案: