我有一列包含诸如“5.00 M”、“1.00 T”和“1.29 Juta”之类的值,并且想要一种简单的方法将其转换为数值。我试过了
import re
powers = {'M': 10 ** 9, 'T': 10 ** 12, 'Juta': 10 ** 6}
var1 = ['4', '7149', '6184.09', '0.00', '8', '134944', '5187.33', '5.00 M', '17', '74104', '60773.22', '260.00 M', '7', '347334', '451922.68', '1.00 T', '80', '18469', '483386.83', '2.50 M', '12', '4716', '14946.30', '0.00', '18', '7119', '111617.66', '0.00', '31', '23131', '814413.09', '0.00', '21', '16281', '192020.50', '0.00', '20', '98381', '57850.37', '0.00', '31', '12501', '39384.40', '0.00', '31', '2851', '1.29 Juta', '0.00', '34', '9440', '171364.82', '0.00', '26', '25442', '54394.00', '0.00', '24', '2492', '165295.95', '0.00', '12', '675', '51301.40', '0.00', '7', '5', '8057.77', '0.00', '6', '704', '35579.19', '0.00', '5', '2133', '15683.20', '0.00', '3', '1356', '5021.00', '0.00', '3', '966', '5456.32', '0.00', '5', '2636', '4097.42', '0.00', '8', '1878', '4554.50', '0.00', '6', '3518', '13900.00', '0.00', '2', '1', '61000.00', '0.00', '3', '0', '1688.00', '0.00', '4', '10', '1488.33', '0.00', '0', '0', '0.00', '0.00', '0', '0', '0.00', '0.00', '2', '0', '4054.00', '0.00', '0', '0', '0.00', '0.00']
def f(num_str):
match = re.search(r"([0-9\.]+)\s?(M|T|Juta)", num_str)
if match is not None:
quantity = match.group(0)
magnitude = match.group(1)
return float(quantity) * powers[magnitude]
for i in var1:
x = f(i)
print(x)
但是我收到了这个错误:
None
None
None
None
None
None
None
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-8dd2f89076c3> in <module>
1 for i in var1:
----> 2 x = f(i)
3 print(x)
<ipython-input-22-cb419bc71fb8> in f(num_str)
7 quantity = match.group(0)
8 magnitude = match.group(1)
----> 9 return float(quantity) * powers[magnitude]
ValueError: could not convert string to float: '5.00 M'
答案 0 :(得分:4)
只需使用 group(1)
和 group(2)
,因为 group(0)
有 entire matching string:
import re
powers = {'M': 10 ** 9, 'T': 10 ** 12, 'Juta': 10 ** 6}
var1 = ['4', '7149', '6184.09', '0.00', '8', '134944', '5187.33', '5.00 M', '17', '74104', '60773.22', '260.00 M', '7', '347334', '451922.68', '1.00 T', '80', '18469', '483386.83', '2.50 M', '12', '4716', '14946.30', '0.00', '18', '7119', '111617.66', '0.00', '31', '23131', '814413.09', '0.00', '21', '16281', '192020.50', '0.00', '20', '98381', '57850.37', '0.00', '31', '12501', '39384.40', '0.00', '31', '2851', '1.29 Juta', '0.00', '34', '9440', '171364.82', '0.00', '26', '25442', '54394.00', '0.00', '24', '2492', '165295.95', '0.00', '12', '675', '51301.40', '0.00', '7', '5', '8057.77', '0.00', '6', '704', '35579.19', '0.00', '5', '2133', '15683.20', '0.00', '3', '1356', '5021.00', '0.00', '3', '966', '5456.32', '0.00', '5', '2636', '4097.42', '0.00', '8', '1878', '4554.50', '0.00', '6', '3518', '13900.00', '0.00', '2', '1', '61000.00', '0.00', '3', '0', '1688.00', '0.00', '4', '10', '1488.33', '0.00', '0', '0', '0.00', '0.00', '0', '0', '0.00', '0.00', '2', '0', '4054.00', '0.00', '0', '0', '0.00', '0.00']
def f(num_str):
match = re.search(r"([0-9\.]+)\s?(M|T|Juta)", num_str)
if match is not None:
quantity = match.group(1)
magnitude = match.group(2)
return float(quantity) * powers[magnitude]
else:
return num_str
for i in var1:
x = f(i)
print(x)
答案 1 :(得分:0)
除了使用错误的组号之外,您的正则表达式还有一些问题。您可以按如下方式修复它:
def f(num_str):
# regex below has been replaced
match = re.search(r"(\d+(?:.\d+)?)\s?(M|T|Juta)?", num_str) # added a ? after Juta) and replaced regex for numeric part.
if match is not None:
quantity = match.group(1)
if match.group(2): # added a test before to check if magnitude exists
magnitude = match.group(2)
return float(quantity) * powers[magnitude]
else: # added a else condition for without magnitude
return float(quantity)
for i in var1:
x = f(i)
print(x)
事实上,您的数字部分的正则表达式 [0-9\.]+
不正确。最好使用 \d+(?:.\d+)?
和 \d+
作为整数部分和可选的小数部分 (.\d+)?
,小数部分包含在 (?: )
中,以使其成为非捕获组。