这是字符串列表,这是我需要的数据。
['31.44 m', '21.38 m', '3.95 m', '3.70 m', '34.10 m', '12.56 m', '7.59 m',
'10.25 m', '107', '132', '752 m³', '5 750 km', 'M0.82', '68.40 tonnes',
'68.00 tonnes', '57.50 tonnes', '54.50 tonnes', '24 210\xa0litres']
字符串中有空格和字符,这使得我很难得到数字。我已经尝试过正则表达式,但它似乎不起作用。
代码如下。
for i in data_spe:#data_spe is the list I used to store the strings(or data)
i=re.findall('\d+\d\.\d',i)
print(i)
我需要输出:
[31.44,21.38,3.95,3.70,34.10,12.56,7.59,10.25,107,132,752,5750,0.82,68.40,68.00,57.50,24210]
答案 0 :(得分:3)
您可以使用正则表达式搜索紧跟数字的数字,小数和空格。然后剥去额外的空间。
import re
d = ['31.44 m', '21.38 m', '3.95 m', '3.70 m', '34.10 m', '12.56 m', '7.59 m',
'10.25 m', '107', '132', '752 m', '5 750 km', 'M0.82', '68.40 tonnes',
'68.00 tonnes', '57.50 tonnes', '54.50 tonnes', '24 210\xa0litres']
[re.search(r'\d[\d\. ]*', x).group().replace(' ','') for x in d]
# returns:
['31.44', '21.38', '3.95', '3.70', '34.10', '12.56', '7.59', '10.25', '107', '132', '752',
'5750', '0.82', '68.40', '68.00', '57.50', '54.50', '24210']
如果要将结果转换为数字而不是字符串,可以使用:
[float(re.search(r'\d[\d\. ]*', x).group().replace(' ','')) for x in d]
答案 1 :(得分:0)
优化re.search()
方法:
import re
lst = ['31.44 m', '21.38 m', '3.95 m', '3.70 m', '34.10 m', '12.56 m', '7.59 m',
'10.25 m', '107', '132', '752 m³', '5 750 km', 'M0.82', '68.40 tonnes',
'68.00 tonnes', '57.50 tonnes', '54.50 tonnes', '24 210\xa0litres']
pat = re.compile(r'\d+(.\d+)?') # compile the crucial pattern beforehand
result = [pat.search(i.replace(' ', '')).group() for i in lst]
print(result)
输出:
['31.44', '21.38', '3.95', '3.70', '34.10', '12.56', '7.59', '10.25', '107', '132', '752', '5750', '0.82', '68.40', '68.00', '57.50', '54.50', '24210']
答案 2 :(得分:0)
data = ['31.44 m', '21.38 m', '3.95 m', '3.70 m', '34.10 m', '12.56 m', '7.59 m',
'10.25 m', '107', '132', '752 m', '5 750 km', 'M0.82', '68.40 tonnes',
'68.00 tonnes', '57.50 tonnes', '54.50 tonnes', '24 210\xa0litres']
def get_numerical_value(data):
for val in data:
get_number = ''.join([num for num in val if num.isdigit() or num == '.'])
if get_number:
yield float(get_number)
get_values = get_numerical_value(data)
print(list(get_values))
>>>[31.44, 21.38, 3.95, 3.7, 34.1, 12.56, 7.59, 10.25, 107.0, 132.0, 752.0, 5750.0, 0.82, 68.4, 68.0, 57.5, 54.5, 24210.0]