Question

我使用parantheses和+的表达式，例如（（（a + b）+ c）+（d + e））。

我需要找到这个解析树，然后打印这个解析树的列表形式，如： [[[a，b]，c]，[d，e]]

我在想我会使用像ast这样的东西，然后是ast2list。但是，由于我没有完全理解这些，我反复得到语法错误。这就是我所拥有的：

import ast
import parser

a = ast.parse("(((a+b)+c)+(d+e))", mode='eval')
b = parser.ast2list(a)


print(b)

有人能引导我朝正确的方向发展吗？感谢。

Answer 1

科琳的评论可以通过以下方式实现：

str = "(((a+b)+c)+(d+e))"


replacements = [
    ('(','['),
    (')',']'),
    ('+',','),
    # If a,b,c,d,e are defined variables, you don't need the following 5 lines
    ('a',"'a'"),
    ('b',"'b'"),
    ('c',"'c'"),
    ('d',"'d'"),
    ('e',"'e'"),
]

for (f,s) in replacements:
    str = str.replace(f,s)

obj = eval(str)

print(str)       # [[['a','b'],'c'],['d','e']]
print(obj)       # [[['a', 'b'], 'c'], ['d', 'e']]
# You can access the parsed elements as you would any iterable:
print(obj[0])    # [['a', 'b'], 'c']
print(obj[1])    # ['d', 'e']
print(obj[1][0]) # d

Answer 2

查看描述NodeVisitor类的ast模块here的文档。

import ast
import sys

class MyNodeVisitor(ast.NodeVisitor):
    op_dict = {
        ast.Add : '+',
        ast.Sub : '-',
        ast.Mult : '*',
    }

    type_dict = {
        ast.BinOp: lambda s, n: s.handleBinOp(n),
        ast.Name: lambda s, n: getattr(n, 'id'),
        ast.Num: lambda s, n: getattr(n, 'n'),
    }

    def __init__(self, *args, **kwargs):
        ast.NodeVisitor.__init__(self, *args, **kwargs)
        self.ast = []

    def handleBinOp(self, node):
        return (self.op_dict[type(node.op)], self.handleNode(node.left), 
                    self.handleNode(node.right))

    def handleNode(self, node):
        value = self.type_dict.get(type(node), None)
        return value(self, node)

    def visit_BinOp(self, node):
        op = self.handleBinOp(node)
        self.ast.append(op)

    def visit_Name(self, node):
        self.ast.append(node.id)

    def visit_Num(self, node):
        self.ast.append(node.n)

    def currentTree(self):
        return reversed(self.ast)

a = ast.parse(sys.argv[1])
visitor = MyNodeVisitor()
visitor.visit(a)
print list(visitor.currentTree())

看起来像这样：

 $ ./ast_tree.py "5 + (1 + 2) * 3"
 [('+', 5, ('*', ('+', 1, 2), 3))]

享受。

Answer 3

如果你真的想做一个解析器，请先从开始编写任何代码，但要了解你的语法应该如何工作。 Backus-Naur Format或BNF是用于定义语法的典型符号。 Infix notation是一个常见的软件工程解析主题，中缀表示法的基本BNF结构如下：

letter ::= 'a'..'z'
operand ::= letter+
term ::= operand | '(' expr ')'
expr ::= term ( '+' term )*

关键是term包含字母操作数或包含在（）中的整个子表达式。该子表达式与整个表达式相同，因此这个递归定义负责所有括号嵌套。然后，表达式是一个术语，后跟零个或多个术语，使用二进制“+”运算符添加。（你可以扩展term以处理减法和乘法/除法，但我不会使这个答案复杂得多。）

Pyparsing是一个使用Python对象轻松将BNF转换为工作解析器的包（Ply，spark和yapps是其他解析器，遵循更传统的解析器创建的lex / yacc模型）。这是BNF直接使用pyparsing实现的：

from pyparsing import Suppress, Word, alphas, Forward, Group, ZeroOrMore

LPAR, RPAR, PLUS = map(Suppress, "()+")
operand = Word(alphas)

# forward declare our overall expression, necessary when defining a recursive grammar
expr = Forward()

# each term is either an alpha operand, or an expr in ()'s
term = operand | Group(LPAR + expr + RPAR)

# define expr as a term, with optional '+ term's
expr << term + ZeroOrMore(PLUS + term)

# try it out
s = "(((a+b)+c)+(d+e))"
print expr.parseString(s)

，并提供：

[[[['a', 'b'], 'c'], ['d', 'e']]]

识别操作优先级的中缀表示法是一种非常常见的解析器，或者是较大解析器的一部分，因此pyparsing包含一个内置调用operatorPrecedence的帮助程序来处理所有嵌套/分组/递归等。这是使用operatorPrecedence编写的相同解析器：

from pyparsing import operatorPrecedence, opAssoc, Word, alphas, Suppress

# define an infix notation with precedence of operations
# you only define one operation '+', so this is a simple case
operand = Word(alphas)
expr = operatorPrecedence(operand,
    [
    ('+', 2, opAssoc.LEFT),
    ])

print expr.parseString(s)

给出与以前相同的结果。

可以在pyparsing wiki上找到更详细的示例 - fourFn.py的显式实现和simpleArith.py的operatorPrecedence实现。

Answer 4

我也会做翻译。通过ast进行操作对于实现来实现来说有点麻烦。。

[tw-172-25-24-198 ~]$ cat a1.py import re def multiple_replace(text, adict): rx = re.compile('|'.join(map(re.escape, adict))) def one_xlat(match): return adict[match.group(0)] return rx.sub(one_xlat, text) # Closure based approach def make_xlat(*args, **kwds): adict = dict(*args, **kwds) rx = re.compile('|'.join(map(re.escape, adict))) def one_xlat(match): return adict[match.group(0)] def xlat(text): return rx.sub(one_xlat, text) return xlat if __name__ == "__main__": text = "((a+b)+c+(d+(e+f)))" adict = { "+":",", "(":"[", ")":"]", } translate = make_xlat(adict) print translate(text)

应该给予

[[a,b],c,[d,[e,f]]]

注意 - 我的收藏中一直有这个片段。它来自Python Cookbook。它在字符串上进行多次替换，一次性使用替换键和字典中的值。

Answer 5

这是一个简单的问题，你可以从头开始编写解决方案。这假定所有变量名称都是一个字符长，或者表达式已正确转换为标记列表。我投入检查以确保所有括号都匹配;很明显你应该换掉CustomError你要抛出的任何异常或你想要采取的其他行动。

def expr_to_list(ex):
    tree = []
    stack = [tree]
    for c in ex:
        if c == '(':
            new_node = []
            stack[-1].append(new_node)
            stack.append(new_node)
        elif c == '+' or c == ' ':
            continue
        elif c == ')':
            if stack[-1] == tree:
                raise CustomError('Unmatched Parenthesis')
            stack.pop()
        else:
            stack[-1].append(c)
    if stack[-1] != tree:
        raise CustomError('Unmatched Parenthesis')
    return tree

测试：

>>> expr_to_list('a + (b + c + (x + (y + z) + (d + e)))')
['a', ['b', 'c', ['x', ['y', 'z'], ['d', 'e']]]]

对于多字符变量名，使用正则表达式进行标记化：

>>> tokens = re.findall('\(|\)|\+|[\w]+', 
                        '(apple + orange + (banana + grapefruit))')
>>> tokens
['(', 'apple', '+', 'orange', '+', '(', 'banana', '+', 'grapefruit', ')', ')']
>>> expr_to_list(tokens)
[['apple', 'orange', ['banana', 'grapefruit']]]

将表达式解析为列表

5 个答案: