基于正则表达式的匹配在Python中切片2D列表

时间:2017-11-03 12:53:00

标签: python arrays regex

我有一个包含许多行和两列的2D数组。我想提取这个2D数组的所有行,其中第一列有一个匹配列表中任何子字符串的字符串。

请参阅下面的人为举例。

students_mat = [['Ant', 'Bat', 'Cat', 'Dog', 'Ear'] , [3.75, 4.32, 2.43, 3.73,  2.51]]


subset_ids = ['A', 'B', 'E']

我想要输出

 accepted_std = [['Ant', 'Bat', 'Ear'] , [3.75, 4.32, 2.51]]

我试过

accepted_std = [s for s in students_mat if any(xs in s for xs in subset_ids)]

这不起作用,但它适用于一维列表。

谢谢&问候, 桑托什

3 个答案:

答案 0 :(得分:2)

students_mat = [['Ant', 'Bat', 'Cat', 'Dog', 'Ear'] , [3.75, 4.32, 2.43, 3.73,  2.51]]
subset_ids = ['A', 'B', 'E']

# we need corresponding pairs from first and second subarrays
pairs = zip(*students_mat)

# filtering
filtered_pairs = ((x, y) for x, y in pairs if x[0] in subset_ids)

# returning to original form
original = zip(*filtered_pairs)

# converting to list of lists
accepted_std = list(map(list, original))

答案 1 :(得分:0)

你可以试试这个:

students_mat = [['Ant', 'Bat', 'Cat', 'Dog', 'Ear'] , [3.75, 4.32, 2.43, 3.73,  2.51]]
subset_ids = ['A', 'B', 'E']
indicies = [i for i, a in enumerate(students_mat[0]) if a[0] in subset_ids]
final_data = [[c for i, c in enumerate(a) if i in indicies] for a in students_mat]

输出:

[['Ant', 'Bat', 'Ear'], [3.75, 4.32, 2.51]]

答案 2 :(得分:0)

您需要将两个列表转换为元组列表,然后将它们转换回两个列表:

accepted_std = zip(*
  [
    (txt, num) for (txt, num) in zip(students_mat[0], students_mat[1])
     if any(xs in txt for xs in subset_ids)
 ]
)

您可以使用zip命令从简单列表创建元组列表:

for (txt, num) in zip(students_map[0], students_map[1]) 

然后,您可以使用与之前相同的方法过滤txt变量:

if any(xs in txt for xs in subset_ids)

最后,您需要解压缩元组列表。 zip(*list)正好适用于此。