Question

我需要一个正则表达式来解析包含分数和操作[+, -, *, or /]的字符串，并使用re模块中的findall函数返回包含分子，分母和操作的5元素元组

示例：str = "15/9 + -9/5"

输出格式为[("15","9","+","-9","5")]

我能够想出这个：

pattern = r'-?\d+|\s+\W\s+'

print(re.findall(pattarn,str))

产生["15","9"," + ","-9","5"]的输出。但是在摆弄了这个时间之后，我无法将其变成5元素元组，并且我无法匹配操作而不匹配它周围的空白区域。

Answer 1

这种模式可行：

(-?\d+)\/(\d+)\s+([+\-*/])\s+(-?\d+)\/(\d+)
#lets walk through it
(-?\d+) #matches any group of digits that may or may not have a `-` sign to group 1
       \/ #escape character to match `/`
         (\d+) #matches any group of digits to group 2
              \s+([+\-*/])\s+ #matches any '+,-,*,/' character and puts only that into group 3 (whitespace is not captured in group)
                              (-?\d+)\/(\d+) #exactly the same as group 1/2 for groups 4/5

演示：

>>> s = "15/9 + -9/5 6/12 * 2/3"
>>> re.findall('(-?\d+)\/(\d+)\s([+\-*/])\s(-?\d+)\/(\d+)',s)
[('15', '9', '+', '-9', '5'), ('6', '12', '*', '2', '3')]

Answer 2

基于正则表达式对字符串进行标记化的一般方法是：

import re

pattern = "\s*(\d+|[/+*-])"

def tokens(x):
  return [ m.group(1) for m in re.finditer(pattern, x) ]

print tokens("9 / 4 +    6 ")

注意：

正则表达式以\s*开头，以传递任何初始空格。
与令牌匹配的正则表达式部分包含在parens中以形成捕获。
不同的令牌模式在捕获中由交替操作|分隔。
使用\W时要小心，因为它也会匹配空格。

使用python re模块的小数数学表达式的正则表达式

2 个答案: