Question

我尝试编写一个函数来标记数学表达式，将输入字符串转换为标记列表，但没有成功。有没有一种简单的方法在Python中执行此操作？例如，给出表达式

sin（1 + 2 * x）+ tan（2.123 * x），

我想获取列表

[ 'sin', '(', '1', '+', '2', '*', 'x', ')', '+', 'tan', '(', '2.123', '*', 'x', ')' ]

提前致谢！

Answer 1

您可以使用tokenize - 模块。 http://docs.python.org/2/library/tokenize.html以下是一个示例

>>> s = "sin( 1 + 2 * x ) + tan( 2.123 * x "
>>> import tokenize
>>> from StringIO import StringIO
>>> tokenize.tokenize(StringIO(s).readline)
1,0-1,3:    NAME    'sin'
1,3-1,4:    OP  '('
1,5-1,6:    NUMBER  '1'
1,7-1,8:    OP  '+'
1,9-1,10:   NUMBER  '2'
1,11-1,12:  OP  '*'
1,13-1,14:  NAME    'x'
1,15-1,16:  OP  ')'
1,17-1,18:  OP  '+'
1,19-1,22:  NAME    'tan'
1,22-1,23:  OP  '('
1,24-1,29:  NUMBER  '2.123'
1,30-1,31:  OP  '*'
1,32-1,33:  NAME    'x'
# and now occurs some error you have to catch

还有一种使用正则表达式的方法：

这是解释注册表的链接，此网站也是测试/探索正则表达式的绝佳工具：http://regex101.com/r/bP6kH1

>>> s = "sin( 1 + 2 * x ) + tan( 2.123 * x "
>>> import re
>>> re.findall(r"(\b\w*[\.]?\w+\b|[\(\)\+\*\-\/])", s)
['sin', '(', '1', '+', '2', '*', 'x', ')', '+', 'tan', '(', '2.123', '*', 'x']

Answer 2

您可以使用pyparsing来解析此类型的表达式：

from pyparsing import *

expr = Forward()

double = Word(nums + ".").setParseAction(lambda t:float(t[0]))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
variable = Word(alphas)
string = dblQuotedString
funccall = Group(variable + "(" + Group(Optional(delimitedList(expr))) + ")")
array_func = Group(funccall + "[" + Group(delimitedList(expr, "][")) + "]")
array_var = Group(variable + "[" + Group(delimitedList(expr, "][")) + "]")

operand = double | string | array_func | funccall | array_var | variable

expop = Literal('^')
signop = oneOf('+ -')
multop = oneOf('* /')
plusop = oneOf('+ -')

expr << operatorPrecedence( operand,
[("^", 2, opAssoc.RIGHT),
(signop, 1, opAssoc.RIGHT),
(multop, 2, opAssoc.LEFT),
(plusop, 2, opAssoc.LEFT),]
)

result = expr.parseString('sin( 1 + 2 * x ) + tan( 2.123 * x )')
print result

打印：

[[['sin', '(', [[1.0, '+', [2.0, '*', 'x']]], ')'], '+', ['tan', '(', [[2.123, '*', 'x']], ')']]]

它是一个允许遵守运算符优先级的嵌套列表。要获得您想要的平面列表，只需将列表展平：

import collections

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

print list(flatten(result))

打印：

['sin', '(', 1.0, '+', 2.0, '*', 'x', ')', '+', 'tan', '(', 2.123, '*', 'x', ')']

或者，如果您只想在不考虑运算符优先级或结构的情况下进行标记化，则可以在一行中执行此操作：

>>> from pyparsing import *
>>> OneOrMore(Word(alphas+"_", alphanums+"_") | Word(printables)).parseString("sin( 1 + 2 * x ) + tan( 2.123 * x )").asList()
['sin', '(', '1', '+', '2', '*', 'x', ')', '+', 'tan', '(', '2.123', '*', 'x', ')']

用Python标记数学表达式

2 个答案: