我有这个字符串列表:$3 million
,$910,000
,$16.5-18 million [ 2 ]
。
我试图将它们转换为 float ,因此对于$3 million
,它将是3000000
,对于$16.5 - 18 million
,我将采取16.5
和18
的平均值。
我尝试使用正则表达式:re.search('\$(.*)million', budget).group(1)
来查找$
和million
之间的部分,但我不知道如何处理带范围的类型({{1 }})。
答案 0 :(得分:2)
我建议这个解决方案将从较大的文本中提取必要的数字(范围)并将它们转换为浮点值。
import re
def xNumber(arg): # This method will parse the suffix and return the corresponding multiplier, else 1
switcher = {
"mln": 1000000,
"million": 1000000,
"bln": 1000000000,
"billion": 1000000000,
"thousand": 1000,
"hundred": 100
}
return switcher.get(arg, 1)
rx = re.compile(r'\$(?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?)(?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))?')
s = "$3 million, $910,000,$16.5-18 million"
result = ""
for match in rx.finditer(s):
if match.group("suffix") and match.group("number").find("-") == -1: # We have no range and have a suffix
result = str(float(match.group("number"))*xNumber(match.group("suffix")))
elif match.group("number").find("-") > -1: # Range
lst = [float(x) for x in match.group("number").split("-")]
result = str(float(sum(lst))/len(lst)) + (" {}".format(match.group("suffix")) if match.group("suffix") else "")
else: result = float(match.group("number").replace(",","")) # Just return the number found converted to a float
print(result)
请参阅IDEONE demo
正则表达式为r'\$(?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?)(?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))?'
:
\$
- $
符号(?P<number>\d+(?:,\d{3})?(?:\.\d+)?(?:-\d+(?:,\d{3})?(?:\.\d+)?)?)
- 带有,
的浮点数作为数字分组符号(可选)以及可选的小数部分和可选范围(?:\s*(?P<suffix>mln|million|bln|billion|thousand|hundred))?
- 匹配替代&#34;后缀&#34;在零个或多个空格之后。