从列表中提取字符串

时间:2019-10-14 01:40:57

标签: python string

我有一个列表:

my_list = ['A70-11370; reprint; rolled; 2000; 26.5 x 38.5',
 'A70-713; reprint; rolled; 1980; 26.5 x 38.5',
 'b70-7814; reprint; Style A; rolled; 1939; 22.5 x 34.5',
 'A70-7600; reprint; rolled; 1986; 26.5 x 38.5',
 'A70-6912; reprint; style C; rolled; 1977; 26.5 x 38.5',
 'A70-8692; reprint; regular; rolled; 1995; 26.5 x 38.5',
 'A70-2978; reprint; rolled; 1991; 26.5 x 38.5',
 'A70-4902; reprint; Style A; rolled; 1999; 26.5 x 38.5',
 'A70-6300; reprint; regular; rolled; 1983; 26.5 x 38.5',
 'MPW-6725; reprint; rolled; 1966; 26.5 x 38']

我想提取包含'x'的字符串(例如26.5 x 38.5)。我尝试过:

string = [i if 'x' in i else np.nan for i in str(my_string).split(';')]

在不满足条件的地方放置nan,但我只是在那儿。有和没有nan占位符的情况下,是否都可以获取我想要的字符串?

6 个答案:

答案 0 :(得分:3)

您需要使用嵌套列表理解功能才能获取列表中的每个子字符串。

[x for s in my_list for x in s.split('; ') if 'x' in x]

结果:

['26.5 x 38.5', '26.5 x 38.5', '22.5 x 34.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38']

使用re更适合此操作,尽管仅使用if 'x' in x可能会返回不想要的结果:

p = re.compile("\d+\.\d+ x \d+\.\d+")
[m.group(0) for m in map(p.search, my_list) if m]

答案 1 :(得分:2)

outputs = [subitem for item in my_list for subitem in item.split(';') if 'x' in subitem]
print(outputs)

输出:

[' 26.5 x 38.5', ' 26.5 x 38.5', ' 22.5 x 34.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38']

答案 2 :(得分:1)

为此使用列表理解可能很难看,我建议分别使用两个for循环以提高可读性。

my_list = ['A70-11370; reprint; rolled; 2000; 26.5 x 38.5',
 'A70-713; reprint; rolled; 1980; 26.5 x 38.5',
 'b70-7814; reprint; Style A; rolled; 1939; 22.5 x 34.5',
 'A70-7600; reprint; rolled; 1986; 26.5 x 38.5',
 'A70-6912; reprint; style C; rolled; 1977; 26.5 x 38.5',
 'A70-8692; reprint; regular; rolled; 1995; 26.5 x 38.5',
 'A70-2978; reprint; rolled; 1991; 26.5 x 38.5',
 'A70-4902; reprint; Style A; rolled; 1999; 26.5 x 38.5',
 'A70-6300; reprint; regular; rolled; 1983; 26.5 x 38.5',
 'MPW-6725; reprint; rolled; 1966; 26.5 x 38']


multiplications = []
for item in my_list:
    for subitem in item.split(';'):
        if 'x' in subitem:
            multiplications.append(subitem.strip())

print('\n'.join(multiplications))

这将输出:

26.5 x 38.5
26.5 x 38.5
22.5 x 34.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38

答案 3 :(得分:1)

string = [i for my_string in my_list for i in str(my_string).split(';') if 'x' in i ]

答案 4 :(得分:0)

是的,如果您只想提取包含'x'的字符串,则可以

sep = ''.join(my_list).split(';')

with_x = filter(lambda str_: 'x' in str_, sep)

for i in with_x:
    print(i)

答案 5 :(得分:0)

这是一个基于正则表达式的解决方案。它比提供的其他解决方案更健壮,因为即使所需的字符串前面没有;,它也可以工作。

import re

reg = re.compile(r'\b(\d+\.\d+\b x \b\d+\.\d+)\b')

new_list = []

for elem in my_list:
  result = re.search(reg, elem)
  if result:
    new_list.append(result.group(0))