Question

我一直在尝试使用正则表达式从文本中检索百分比。遗憾的是，使用.sub不会检索所有匹配项。

data = "Tho grades of the two students improved by 5.2% and 6.2%."
re_1 = re.compile(r"\b(\d+\.)?\d+(%|(\spercent))")
data = re.sub(re_1, "__PERCENTAGE__", data, re.I)

我正在尝试检索诸如“5％”，“20.2％”，“5％”，“5.2％”之类的内容。单词百分比和百分比符号是匹配的一部分是好的，但我怀疑麻烦来自重叠。输入上述数据时，当前输出为：

"The grades of the two students improved by __PERCENTAGE__ and 6.2%."

有关如何确保两个百分比都成为匹配的任何提示？非常感谢。

PS：可能相关，我正在使用Python 3

Answer 1

您可能遇到的问题与在不同的Python 3版本中处理u的方式有关。此外，您将已编译的正则表达式对象传递给re.sub，而只应将字符串模式作为第一个参数传递。

import re
p = re.compile(r'(\b\d+(?:\.\d+)?(?:\spercent|\%))')
test_str = "The grades of the two students improved by 5.2% and 5.4% it was 5 percent or 1.2%."
result = re.sub(p, "__PERCENTAGE__", test_str)
print (result)

在IDEONE Demo（使用Python 3.4）中，代码编译良好并输出

The grades of the two students improved by __PERCENTAGE__ and __PERCENTAGE__ it was __PERCENTAGE__ or __PERCENTAGE__.

正则表达式re.sub未检测到所有匹配

1 个答案: