Question

我有这个字符串列表：$3 million，$910,000，$16.5-18 million [ 2 ]。

我试图将它们转换为 float ，因此对于$3 million，它将是3000000，对于$16.5 - 18 million，我将采取16.5和18的平均值。

我尝试使用正则表达式：re.search('\$(.*)million', budget).group(1)来查找$和million之间的部分，但我不知道如何处理带范围的类型（{{1 }}）。

Answer 1

我建议这个解决方案将从较大的文本中提取必要的数字（范围）并将它们转换为浮点值。

import re
def xNumber(arg):          # This method will parse the suffix and return the corresponding multiplier, else 1
    switcher = {
        "mln": 1000000,
        "million": 1000000,
        "bln": 1000000000,
        "billion": 1000000000,
        "thousand": 1000,
        "hundred": 100
    }
    return switcher.get(arg, 1)

rx = re.compile(r'\$(?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?)(?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))?')
s = "$3 million, $910,000,$16.5-18 million"
result = ""
for match in rx.finditer(s):
    if match.group("suffix") and match.group("number").find("-") == -1:   # We have no range and have a suffix
        result = str(float(match.group("number"))*xNumber(match.group("suffix")))
    elif match.group("number").find("-") > -1:  # Range
        lst = [float(x) for x in match.group("number").split("-")]
        result = str(float(sum(lst))/len(lst)) + (" {}".format(match.group("suffix")) if match.group("suffix") else "")
    else: result = float(match.group("number").replace(",",""))  # Just return the number found converted to a float
    print(result)

请参阅IDEONE demo

正则表达式为r'\$(?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?)(?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))?'：

\$ - $符号
(?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?) - 带有,的浮点数作为数字分组符号（可选）以及可选的小数部分和可选范围
(?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))? - 匹配替代＆＃34;后缀＆＃34;在零个或多个空格之后。

python从字符串正则表达式中提取数字

1 个答案: