pyparsing值列表的递归(ibm狂想曲)

时间:2016-04-10 01:03:25

标签: python recursion pyparsing rhapsody

我正在为IBM Rhapsody sbs文件格式构建解析器。但不幸的是,递归部分不能按预期工作。规则pp.Word(pp.printables + " ")可能是问题,因为它也匹配;{}。但至少;也可以成为价值观的一部分。

import pyparsing as pp
import pprint


TEST = r"""{ foo
    - key = bla;
    - value = 1243; 1233; 1235;
    - _hans = "hammer
    time";
    - HaMer = 765; 786; 890;
    - value = "
    #pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
        ";
    - _mText = 12.11.2015::13:20:0;
    - value = "war"; "fist";
    - _obacht = "fish,car,button";
    - _id = gibml c0d8-4535-898f-968362779e07;
    - bam = { boing
        - key = bla;
    }
    { boing
        - key = bla;
    }
}
"""


def flat(loc, toks):
    if len(toks[0]) == 1:
        return toks[0][0]

assignment = pp.Suppress("-") + pp.Word(pp.alphanums + "_") + pp.Suppress("=")

value = pp.OneOrMore(
    pp.Group(assignment + (
        pp.Group(pp.OneOrMore(
            pp.QuotedString('"', escChar="\\", multiline=True) +
            pp.Suppress(";"))).setParseAction(flat) |
        pp.Word(pp.alphas) + pp.Suppress(";") |
        pp.Word(pp.printables + " ")
    ))
)

expr = pp.Forward()

expr = pp.Suppress("{") + pp.Word(pp.alphas) + (
    value | (assignment + expr) | expr
) + pp.Suppress("}")
expr = expr.ignore(pp.pythonStyleComment)


print TEST
pprint.pprint(expr.parseString(TEST).asList())

输出:

% python prase.py                                                    
{ foo
    - key = bla;
    - value = 1243; 1233; 1235;
    - _hans = "hammer
    time";
    - HaMer = 765; 786; 890;
    - value = "
    #pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
        ";
    - _mText = 12.11.2015::13:20:0;
    - value = "war"; "fist";
    - _obacht = "fish,car,button";
    - _id = gibml c0d8-4535-898f-968362779e07;
    - bam = { boing
        - key = bla;
    }
    { boing
        - key = bla;
    }
}

['foo',
 ['key', 'bla'],
 ['value', '1243; 1233; 1235;'],
 ['_hans', 'hammer\n    time'],
 ['HaMer', '765; 786; 890;'],
 ['value', '\n    #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n        '],
 ['_mText', '12.11.2015::13:20:0;'],
 ['value', ['war', 'fist']],
 ['_obacht', 'fish,car,button'],
 ['_id', 'gibml c0d8-4535-898f-968362779e07;'],
 ['bam', '{ boing'],
 ['key', 'bla']]

1 个答案:

答案 0 :(得分:2)

哇,这是一种凌乱的模型格式!我想这会让你亲近。我开始试图描述一个有效的值表达式。我看到每个分组都可以包含&#39 ;;' -terminated属性定义,或者' {}' -enclosed嵌套对象。每个对象都包含一个给出对象类型的前导标识符。

困难的问题是我命名为' value_word'的非常一般的标记,这几乎是任何字符组合,只要它不是' - ',& #39; {'或者'}'。 ' value_word'定义中的否定前瞻照顾好这个。我认为这里的一个关键问题是我能够包含' '作为' value_word'中的有效字符,但是让pyparsing执行其默认空格跳过,可能会有一个或多个' value_word组成' attr_value& #39;

最终的踢球者(在您的测试用例中找不到,但在您链接到的示例中)是属性'赋值的这一行:

            - m_pParent = ;

因此attr_value也必须允许空字符串。

from pyparsing import *

LBRACE,RBRACE,SEMI,EQ,DASH = map(Suppress,"{};=-")

ident = Word(alphas + '_', alphanums+'_').setName("ident")
guid = Group('GUID' + Combine(Word(hexnums)+('-'+Word(hexnums))*4))
qs = QuotedString('"', escChar="\\", multiline=True)
character_literal = Combine("'" + oneOf(list(printables+' ')) + "'")
value_word = ~DASH + ~LBRACE + ~RBRACE + Word(printables, excludeChars=';').setName("value_word")

value_atom = guid | qs | character_literal | value_word

object_ = Forward()

attr_value = OneOrMore(object_) | Optional(delimitedList(Group(value_atom+OneOrMore(value_atom))|value_atom, ';')) + SEMI
attr_value.setName("attr_value")
attr_defn = Group(DASH + ident("name") + EQ + Group(attr_value)("value"))
attr_defn.setName("attr_defn")

object_ <<= Group(
    LBRACE + ident("type") +
    Group(ZeroOrMore(attr_defn | object_))("attributes") + 
    RBRACE
    )

object_.parseString(TEST).pprint()

对于你的测试字符串,它给出:

[['foo',
  [['key', ['bla']],
   ['value', ['1243', '1233', '1235']],
   ['_hans', ['hammer\n    time']],
   ['HaMer', ['765', '786', '890']],
   ['value', ['\n    #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n        ']],
   ['_mText', ['12.11.2015::13:20:0']],
   ['value', ['war', 'fist']],
   ['_obacht', ['fish,car,button']],
   ['_id', [['gibml', 'c0d8-4535-898f-968362779e07']]],
   ['bam', [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]]]]]

我添加了可能有助于处理这些结构的结果名称。使用object_.parseString(TEST).dump()给出了这个输出:

[['foo', [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer\n    time']], ...
[0]:
  ['foo', [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer\n    time']], ...
  - attributes: [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer...
    [0]:
      ['key', ['bla']]
      - name: key
      - value: ['bla']
    [1]:
      ['value', ['1243', '1233', '1235']]
      - name: value
      - value: ['1243', '1233', '1235']
    [2]:
      ['_hans', ['hammer\n    time']]
      - name: _hans
      - value: ['hammer\n    time']
    [3]:
      ['HaMer', ['765', '786', '890']]
      - name: HaMer
      - value: ['765', '786', '890']
    [4]:
      ['value', ['\n    #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n        ']]
      - name: value
      - value: ['\n    #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n        ']
    [5]:
      ['_mText', ['12.11.2015::13:20:0']]
      - name: _mText
      - value: ['12.11.2015::13:20:0']
    [6]:
      ['value', ['war', 'fist']]
      - name: value
      - value: ['war', 'fist']
    [7]:
      ['_obacht', ['fish,car,button']]
      - name: _obacht
      - value: ['fish,car,button']
    [8]:
      ['_id', [['gibml', 'c0d8-4535-898f-968362779e07']]]
      - name: _id
      - value: [['gibml', 'c0d8-4535-898f-968362779e07']]
        [0]:
          ['gibml', 'c0d8-4535-898f-968362779e07']
    [9]:
      ['bam', [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]]
      - name: bam
      - value: [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]
        [0]:
          ['boing', [['key', ['bla']]]]
          - attributes: [['key', ['bla']]]
            [0]:
              ['key', ['bla']]
              - name: key
              - value: ['bla']
          - type: boing
        [1]:
          ['boing', [['key', ['bla']]]]
          - attributes: [['key', ['bla']]]
            [0]:
              ['key', ['bla']]
              - name: key
              - value: ['bla']
          - type: boing
  - type: foo

一旦删除了主要版本行,它也会成功解析链接的示例。