Question

我正试图提出一个正则表达式，以便在Python中搜索美元价值。我已经在SO帖子上查看并尝试了很多解决方案，但是没有一个是完全有效的。

我想到的正则表达式是：

[Ss]        # OCR will mess up with dollar signs, so I'm specifically looking for S and s as the starting of what I'm looking for
\d+         # any digits to start off
(,\d{3})*   # include comma for thousand splits, can have multiple commas
(.\d{2})?   # include dot and 2 decimals, but only one occurrence of this part

我已在以下示例中进行了尝试：

t = "sixteen thousand three hundred and thirty dollars (s16,330.00)"
r = "[Ss]\d+(,\d{3})*(.\d{2})?"

re.findall(pattern=r, string=t)

然后我得到了

[(',330', '.00')]

正则表达式文档说：

如果模式中存在一个或多个组，则返回一个列表组；如果模式有多个，则这将是一个元组列表组。空匹配项包含在结果中。

但是它甚至没有得到整数部分。

我的问题是：我真的很想找到s16,330.00作为一个整体。有解决方案吗？

Answer 1

删除捕获组以允许findall返回完全匹配的字符串：

>>> t = "sixteen thousand three hundred and thirty dollars (s16,330.00)"
>>> r = r"[Ss]\d+(?:,\d{3})*(?:\.\d{2})?"
>>> re.findall(pattern=r, string=t)
['s16,330.00']

还请注意，点必须在正则表达式中转义

Answer 2

使用finditer：

import re

t = "sixteen thousand three hundred and thirty dollars (s16,330.00)"
r = "[Ss]\d+(,\d{3})*(.\d{2})?"

result = [match.group() for match in re.finditer(pattern=r, string=t)]
print(result)

输出

['s16,330.00']

函数finditer返回产生match objects的迭代器。不带参数的匹配对象的方法组将返回整个匹配。

Answer 3

对整个图案使用捕获组，对子图案不使用捕获组：

t = "sixteen thousand three hundred and thirty dollars (s16,330.00)"
re.findall(r"([Ss]\d+(?:,\d{3})*(?:.\d{2})?)", t)
['s16,330.00']

re.findall(pattern, string, flags=0)

以字符串列表形式返回字符串中所有不重复的模式匹配项。从左到右扫描该字符串，并以找到的顺序返回匹配项。 如果该模式中存在一个或多个组，请返回组列表；如果模式包含多个组，则这将是一个元组列表。空匹配项包含在结果中。

https://docs.python.org/2/library/re.html#re.findall

python正则表达式以匹配美元价值

3 个答案: