python从字符串正则表达式中提取数字

时间:2016-06-20 01:33:50

标签: regex python-2.7

我有这个字符串列表:$3 million$910,000$16.5-18 million [ 2 ]

我试图将它们转换为 float ,因此对于$3 million,它将是3000000,对于$16.5 - 18 million,我将采取16.518的平均值。

我尝试使用正则表达式:re.search('\$(.*)million', budget).group(1)来查找$million之间的部分,但我不知道如何处理带范围的类型({{1 }})。

1 个答案:

答案 0 :(得分:2)

我建议这个解决方案将从较大的文本中提取必要的数字(范围)并将它们转换为浮点值。

import re
def xNumber(arg):          # This method will parse the suffix and return the corresponding multiplier, else 1
    switcher = {
        "mln": 1000000,
        "million": 1000000,
        "bln": 1000000000,
        "billion": 1000000000,
        "thousand": 1000,
        "hundred": 100
    }
    return switcher.get(arg, 1)

rx = re.compile(r'\$(?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?)(?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))?')
s = "$3 million, $910,000,$16.5-18 million"
result = ""
for match in rx.finditer(s):
    if match.group("suffix") and match.group("number").find("-") == -1:   # We have no range and have a suffix
        result = str(float(match.group("number"))*xNumber(match.group("suffix")))
    elif match.group("number").find("-") > -1:  # Range
        lst = [float(x) for x in match.group("number").split("-")]
        result = str(float(sum(lst))/len(lst)) + (" {}".format(match.group("suffix")) if match.group("suffix") else "")
    else: result = float(match.group("number").replace(",",""))  # Just return the number found converted to a float
    print(result)

请参阅IDEONE demo

正则表达式为r'\$(?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?)(?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))?'

  • \$ - $符号
  • (?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?) - 带有,的浮点数作为数字分组符号(可选)以及可选的小数部分和可选范围
  • (?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))? - 匹配替代&#34;后缀&#34;在零个或多个空格之后。