Question

我们假设我们有以下字符串

string = """
object obj1{
    attr1 value1;


    object obj2 {
        attr2 value2;
    }
}

object obj3{
    attr3 value3;
    attr4 value4;
}

"""

有一个嵌套对象，我们使用Forward来解析它。

from pyparsing import *
word = Word(alphanums)

attribute = word.setResultsName("name")
value = word.setResultsName("value")

object_grammar = Forward()

attributes = attribute + value + Suppress(";") + LineEnd().suppress()
object_type = Suppress("object ") + word.setResultsName("object_type") + Suppress('{') + LineEnd().suppress()

object_grammar <<= object_type+\
    OneOrMore(attributes|object_grammar) + Suppress("}") | Suppress("};")

for i, (obj, _, _) in enumerate(object_grammar.scanString(string)):
    print('\n')
    print('Enumerating over object {}'.format(i))
    print('\n')
    print('This is the object type {}'.format(obj.object_type))
    print(obj.asXML())
    print(obj.asDict())
    print(obj.asList())
    print(obj)
    print(obj.dump())

这些是结果。 obj.asXML（）函数包含所有信息，但由于它已被展平，因此信息的顺序对于解析结果至关重要。这是最好的方法吗？我肯定错过了什么。我想要一个适用于嵌套和非嵌套对象的解决方案，即obj1，obj2和obj3。

此外，setResultsName('object_type')不会返回父对象的object_type。上述程序的输出如下所示。有什么建议吗？

Enumerating over object 0

This is the object type obj2

<ITEM>
  <object_type>obj1</object_type>
  <name>attr1</name>
  <value>value1</value>
  <object_type>obj2</object_type>
  <name>attr2</name>
  <value>value2</value>
</ITEM>
{'object_type': 'obj2', 'name': 'attr2', 'value': 'value2'}
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
- name: attr2
- object_type: obj2
- value: value2


Enumerating over object 1


This is the object type obj3

<ITEM>
  <object_type>obj3</object_type>
  <name>attr3</name>
  <value>value3</value>
  <name>attr4</name>
  <value>value4</value>
</ITEM>
{'object_type': 'obj3', 'name': 'attr4', 'value': 'value4'}
['obj3', 'attr3', 'value3', 'attr4', 'value4']
['obj3', 'attr3', 'value3', 'attr4', 'value4']
['obj3', 'attr3', 'value3', 'attr4', 'value4']
- name: attr4
- object_type: obj3
- value: value4

Answer 1

我能够通过在setResultsNames函数中使用listAllMatches=True来解决这个问题。这给了我asXML（）结果，其结构可以从中检索信息。它仍然依赖于XML的顺序，并且需要使用zip来同时获取name和value attribute。我会留下这个问题，看看我是否有更好的方法来做到这一点。

Answer 2

虽然您已成功处理输入字符串，但我建议您对语法进行一些改进。

在定义递归语法时，通常需要在输出结果中维护一些结构。在您的情况下，要构造的逻辑部分是每个对象的内容，它由开括号和右括号围绕。从概念上讲：

int main(int argc, const char * argv[]) {
    int sum = 0;
    int num;
    for(int num=1; num<=5; num++){
        sum = sum + num;
    }
    printf(" The sum of numbers 1 to 5 are %d",sum);

    return 0;
}

然后支持表达式（仅在概念上）：

object_content = '{' + ZeroOrMore(attribute_defn | object_defn) + '}'

实际的Python / pyparsing看起来像：

attribute_defn = identifier + attribute_value + ';'
object_defn = 'object' + identifier + object_content

LBRACE,RBRACE,SEMI = map(Suppress, "{};") word = Word(alphas, alphanums) attribute = word # expand to include other values if desired, such as ints, reals, strings, etc. attribute_value = word attributeDefn = Group(word("name") + value("value") + SEMI) OBJECT = Keyword("object") object_header = OBJECT + word("object_name") object_grammar = Forward() object_body = Group(LBRACE + ZeroOrMore(object_grammar | attributeDefn) + RBRACE) object_grammar <<= Group(object_header + object_body("object_content"))为我们做了两件事：它将结果组织成子对象;并且它将结果名称保持在一个级别，而不是踩到不同级别的结果名称（因此不需要Group）。

现在，您只需使用listAllMatches处理输入，而不是scanString：

OneOrMore

，并提供：

print(OneOrMore(object_grammar).parseString(string).dump())

我开始只是对您的代码进行简单的更改，但是您的原始版本存在致命缺陷。你的解析器将左右括号分成两个独立的表达式 - 当这个“工作”时，它会破坏定义结果的组结构的能力。

如何从pyparsing Forward对象获得结果？

2 个答案: