如何查找以下字符串

时间:2018-03-30 12:40:45

标签: python regex

这是字符串列表,这是我需要的数据。

['31.44 m', '21.38 m', '3.95 m', '3.70 m', '34.10 m', '12.56 m', '7.59 m', 
 '10.25 m', '107', '132', '752 m³', '5 750 km', 'M0.82', '68.40 tonnes', 
 '68.00 tonnes', '57.50 tonnes', '54.50 tonnes', '24 210\xa0litres']

字符串中有空格和字符,这使得我很难得到数字。我已经尝试过正则表达式,但它似乎不起作用。

代码如下。

for i in data_spe:#data_spe is the list I used to store the strings(or data)
    i=re.findall('\d+\d\.\d',i)
    print(i)

我需要输出:

[31.44,21.38,3.95,3.70,34.10,12.56,7.59,10.25,107,132,752,5750,0.82,68.40,68.00,57.50,24210]

3 个答案:

答案 0 :(得分:3)

您可以使用正则表达式搜索紧跟数字的数字,小数和空格。然后剥去额外的空间。

import re

d = ['31.44 m', '21.38 m', '3.95 m', '3.70 m', '34.10 m', '12.56 m', '7.59 m',
 '10.25 m', '107', '132', '752 m', '5 750 km', 'M0.82', '68.40 tonnes',
 '68.00 tonnes', '57.50 tonnes', '54.50 tonnes', '24 210\xa0litres']

[re.search(r'\d[\d\. ]*', x).group().replace(' ','') for x in d]

# returns:
['31.44', '21.38', '3.95', '3.70', '34.10', '12.56', '7.59', '10.25', '107', '132', '752',
 '5750', '0.82', '68.40', '68.00', '57.50', '54.50', '24210']

如果要将结果转换为数字而不是字符串,可以使用:

[float(re.search(r'\d[\d\. ]*', x).group().replace(' ','')) for x in d]

答案 1 :(得分:0)

优化re.search()方法:

import re

lst = ['31.44 m', '21.38 m', '3.95 m', '3.70 m', '34.10 m', '12.56 m', '7.59 m', 
    '10.25 m', '107', '132', '752 m³', '5 750 km', 'M0.82', '68.40 tonnes', 
    '68.00 tonnes', '57.50 tonnes', '54.50 tonnes', '24 210\xa0litres']

pat = re.compile(r'\d+(.\d+)?')   # compile the crucial pattern beforehand
result = [pat.search(i.replace(' ', '')).group() for i in lst]

print(result)

输出:

['31.44', '21.38', '3.95', '3.70', '34.10', '12.56', '7.59', '10.25', '107', '132', '752', '5750', '0.82', '68.40', '68.00', '57.50', '54.50', '24210']

答案 2 :(得分:0)

data = ['31.44 m', '21.38 m', '3.95 m', '3.70 m', '34.10 m', '12.56 m', '7.59 m', 
 '10.25 m', '107', '132', '752 m', '5 750 km', 'M0.82', '68.40 tonnes', 
 '68.00 tonnes', '57.50 tonnes', '54.50 tonnes', '24 210\xa0litres']
def get_numerical_value(data):
    for val in data:
        get_number = ''.join([num for num in val if num.isdigit() or num == '.'])
        if get_number:
            yield float(get_number)
get_values = get_numerical_value(data)
print(list(get_values))
>>>[31.44, 21.38, 3.95, 3.7, 34.1, 12.56, 7.59, 10.25, 107.0, 132.0, 752.0, 5750.0, 0.82, 68.4, 68.0, 57.5, 54.5, 24210.0]