Question

我是python的新手。我正在尝试使用正则表达式从子字符串中提取美元金额。它在大多数情况下都有效，但是我遇到了一些我无法解决的问题。

结果金额是一个字符串，由于逗号而未被识别为金额。对于少于$1（例如0.89）的少量金额，它也不起作用。没有前导$。任何帮助将不胜感激。

这是我所拥有的：

df['Amount']=df['description'].str.extract('(\d{1,3}?(\,\d{3})*\.\d{2})')

这里是一个应该解析的字符串：

000000000463 NYC DOF OPA CONCENTRATION ACCT. *00029265 07/01/2013 AP5378 1,107,844.38 Ven000000000463 Vch:00029265

我正在尝试在数据框对象的单独列中提取金额1,107,844.38。我没有任何应该拒绝的字符串。

Answer 1

您可以尝试使用正则表达式，例如

rx = r"\b(?<!/)(\d{1,3}(?:,\d{3})*(?:\.\d{2})?)\b(?!/)"
df['Amount']=df['description'].str.extract(rx)

请参见regex demo

详细信息

\b-单词边界
(?<!/)-当前位置左侧不/（以避免与日期时间值匹配）
\d{1,3}-1到3位数字
(?:,\d{3})*-0个重复的,和3位数字
(?:\.\d{2})?-可选的.和2位数字
\b-单词边界
(?!/)-当前位置右边不/（以避免与日期时间值匹配）

Answer 2

给出示例字符串：

"000000000463 NYC DOF OPA CONCENTRATION ACCT. *00029265 07/01/2013 AP5378 1,107,844.38 Ven000000000463 Vch:00029265"

这是我想出的：

match = re.search(r"(?P<amount>\$?(?:\d+,)*\d+\.\d+)", subject)
if match:
    result = match.group("amount")  # result will be "1,107,844.38"
else:
    result = ""

提取金额。它还处理0.38之类的小额数字，123456789.38之类没有千位分隔符的数字，也可能不带有美元符号$。

正则表达式详细信息

(?<amount>\$?(?:\d+,)*\d+\.\d+)  Match the regular expression below and capture its match into backreference with name “amount” 
\$?                              Match the character “$” literally
?                                Between zero and one times, as many times as possible, giving back as needed (greedy) 
(?:\d+,)*                        Match the regular expression below 
*                                Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
,                                Match the character “,” literally 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\.                               Match the character “.” literally 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy)

美元金额，千位之间用逗号分隔

2 个答案: