使用pyparsing解析类似C的声明

时间:2015-02-25 07:12:09

标签: python pyparsing

我想在类似C的源代码(GLSL代码)中使用 pyparsing 解析声明,以便获得(类型,名称,值)的列表。

例如:

int a[3];
int b=1, c=2.0;
float d = f(z[2], 2) + 3*g(4,a), e;
Point f = {1,2};

我想获得类似的东西:

[ ('int',   'a[3]', ''),
  ('int',   'b',    '1'),
  ('int',   'c',    '2.0'),
  ('float', 'd',    'f(z[2], 2) + 3*g(4,a)'),
  ('float', 'e',    ''),
  ('Point', 'f',    '{1,2}') ]

我使用Forward()operatorPrecedence()来尝试解析rhs表达式,但我怀疑在我的情况下没有必要。

到目前为止,我有:

IDENTIFIER = Regex('[a-zA-Z_][a-zA-Z_0-9]*')
INTEGER    = Regex('([+-]?(([1-9][0-9]*)|0+))')
EQUAL      = Literal("=").suppress()
SEMI       = Literal(";").suppress()
SIZE       = INTEGER | IDENTIFIER
VARNAME    = IDENTIFIER
TYPENAME   = IDENTIFIER
VARIABLE = Group(VARNAME.setResultsName("name")
                 + Optional(EQUAL + Regex("[^,;]*").setResultsName("value")))
VARIABLES = delimitedList(VARIABLE.setResultsName("variable",listAllMatches=True))
DECLARATION = (TYPENAME.setResultsName("type")
               + VARIABLES.setResultsName("variables", listAllMatches=True) + SEMI)

code = """
float a=1, b=3+f(2), c;
float d=1.0, e;
float f = z(3,4);
"""

for (token, start, end) in DECLARATION.scanString(code):
    for variable in token.variable:
        print token.type, variable.name, variable.value

但由于f=z(3,4)而未解析最后一个表达式(,)。

2 个答案:

答案 0 :(得分:0)

C struct parser上有一个pyparsing wiki可能会给你一个良好的开端。

答案 1 :(得分:0)

这似乎有效。

IDENTIFIER       = Word(alphas+"_", alphas+nums+"_" )
INT_DECIMAL      = Regex('([+-]?(([1-9][0-9]*)|0+))')
INT_OCTAL        = Regex('(0[0-7]*)')
INT_HEXADECIMAL  = Regex('(0[xX][0-9a-fA-F]*)')
INTEGER          = INT_HEXADECIMAL | INT_OCTAL | INT_DECIMAL
FLOAT            = Regex('[+-]?(((\d+\.\d*)|(\d*\.\d+))([eE][-+]?\d+)?)|(\d*[eE][+-]?\d+)')
LPAREN, RPAREN   = Literal("(").suppress(), Literal(")").suppress()
LBRACK, RBRACK   = Literal("[").suppress(), Literal("]").suppress()
LBRACE, RBRACE   = Literal("{").suppress(), Literal("}").suppress()
SEMICOLON, COMMA = Literal(";").suppress(), Literal(",").suppress()
EQUAL            = Literal("=").suppress()
SIZE             = INTEGER | IDENTIFIER
VARNAME          = IDENTIFIER
TYPENAME         = IDENTIFIER
OPERATOR         = oneOf("+ - * / [ ] . & ^ ! { }")

PART        = nestedExpr() | nestedExpr('{','}') | IDENTIFIER | INTEGER | FLOAT | OPERATOR
EXPR        = delimitedList(PART, delim=Empty()).setParseAction(keepOriginalText)
VARIABLE    = (VARNAME("name") + Optional(LBRACK + SIZE + RBRACK)("size")
                               + Optional(EQUAL + EXPR)("value"))
VARIABLES   = delimitedList(VARIABLE.setResultsName("variables",listAllMatches=True))
DECLARATION = (TYPENAME("type") + VARIABLES + SEMICOLON)

code = """
int a[3];
int b=1, c=2.0;
float d = f(z[2], 2) + 3*g(4,a), e;
Point f = {1,2};
"""

for (token, start, end) in DECLARATION.scanString(code):
    vtype = token.type
    for variable in token.variables:
        name = variable.name
        size = variable.size
        value = variable.value
        s = "%s / %s" % (vtype,name)
        if size:  s += ' [%s]' % size[0]
        if value: s += ' / %s' % value[0]
        s += ";"
        print s