美元金额,千位之间用逗号分隔

时间:2018-10-12 16:39:13

标签: python regex currency

我是python的新手。我正在尝试使用正则表达式从子字符串中提取美元金额。它在大多数情况下都有效,但是我遇到了一些我无法解决的问题。

结果金额是一个字符串,由于逗号而未被识别为金额。对于少于$1(例如0.89)的少量金额,它也不起作用。没有前导$。任何帮助将不胜感激。

这是我所拥有的:

df['Amount']=df['description'].str.extract('(\d{1,3}?(\,\d{3})*\.\d{2})')

这里是一个应该解析的字符串:

000000000463 NYC DOF OPA CONCENTRATION ACCT. *00029265 07/01/2013 AP5378 1,107,844.38 Ven000000000463 Vch:00029265

我正在尝试在数据框对象的单独列中提取金额1,107,844.38。我没有任何应该拒绝的字符串。

2 个答案:

答案 0 :(得分:0)

您可以尝试使用正则表达式,例如

rx = r"\b(?<!/)(\d{1,3}(?:,\d{3})*(?:\.\d{2})?)\b(?!/)"
df['Amount']=df['description'].str.extract(rx)

请参见regex demo

详细信息

  • \b-单词边界
  • (?<!/)-当前位置左侧不/(以避免与日期时间值匹配)
  • \d{1,3}-1到3位数字
  • (?:,\d{3})*-0个重复的,和3位数字
  • (?:\.\d{2})?-可选的.和2位数字
  • \b-单词边界
  • (?!/)-当前位置右边不/(以避免与日期时间值匹配)

答案 1 :(得分:0)

给出示例字符串:

"000000000463 NYC DOF OPA CONCENTRATION ACCT. *00029265 07/01/2013 AP5378 1,107,844.38 Ven000000000463 Vch:00029265"

这是我想出的:

match = re.search(r"(?P<amount>\$?(?:\d+,)*\d+\.\d+)", subject)
if match:
    result = match.group("amount")  # result will be "1,107,844.38"
else:
    result = ""

提取金额。它还处理0.38之类的小额数字,123456789.38之类没有千位分隔符的数字,也可能不带有美元符号$

正则表达式详细信息

(?<amount>\$?(?:\d+,)*\d+\.\d+)  Match the regular expression below and capture its match into backreference with name “amount” 
\$?                              Match the character “$” literally
?                                Between zero and one times, as many times as possible, giving back as needed (greedy) 
(?:\d+,)*                        Match the regular expression below 
*                                Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
,                                Match the character “,” literally 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\.                               Match the character “.” literally 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy)