Question

嗨，我正在尝试过滤嵌套列表，使其仅包含包含单词“ yellow”的字符串。我的目的是将每种颜色分别存储在自己的数据框中

我尝试了labels.str.split（'yellow'），但它只是告诉我'list'对象没有属性'str'

[['Example1 (purple)',   
  ' Example2 (blue)',
  ' Example3 (orange)',
  ' Example4 (yellow)',
  ' Example5 (red)',
  ' Example6 (pink)',
  ' Example7 (sky)'],
 ['Example8 (purple)',
  ' Example9 (blue)',
  ' Example10 (orange)',
  ' Example11 (sky)',
  ' Example12 (green)',
  ' Example13 (green)',
  ' Example14 (yellow)',
  ' Example15 (red)',
  ' Example16 (pink)',
  ' Example17 (pink)',
  ' Example18 (green)',
  ' Example19 (sky)']]

Answer 1

导入必要的程序包并初始化数据：

import pandas as pd
import re

my_list = [['Example1 (purple)',
  ' Example2 (blue)',
  ' Example3 (orange)',
  ' Example4 (yellow)',
  ' Example5 (red)',
  ' Example6 (pink)',
  ' Example7 (sky)'],
 ['Example8 (purple)',
  ' Example9 (blue)',
  ' Example10 (orange)',
  ' Example11 (sky)',
  ' Example12 (green)',
  ' Example13 (green)',
  ' Example14 (yellow)',
  ' Example15 (red)',
  ' Example16 (pink)',
  ' Example17 (pink)',
  ' Example18 (green)',
  ' Example19 (sky)']]

平铺列表，因此列表中没有嵌套列表。（这就是为什么出现列表未拆分的错误的原因。如果您执行[x.split() for x in my_list]，则将产生错误，因为由my_list组成的元素是列表）

定义一个flatlist函数并展平列表：

flat_list = lambda l: [item for sublist in l for item in sublist]
flat = flat_list(my_list)

创建一个空的数据框

df = pd.DataFrame({})

提取单个平面列表的元素。这会剥离它的空白，然后按空格将其拆分，将第0个元素用作“ Example1”，然后再次剥离它以删除空白。再次执行此操作，但将颜色作为第一个元素。将其包装在（）中，并用逗号分隔以将其作为元组返回。

splitout = [(x.strip().split(' ')[0].strip(), x.strip().split(' ')[1]) for x in pd.Series(flat)]

设置两个数据框列。第一个只是抓住拆分的第一个元素，该元素始终是Example，第二个使用re.sub从颜色中删除（）

df['Example'] = [x[0] for x in splitout]
df['Color'] = [re.sub('[/(/)]', '', x[1]) for x in splitout]

      Example   Color
0    Example1  purple
1    Example2    blue
2    Example3  orange
3    Example4  yellow
4    Example5     red
5    Example6    pink
6    Example7     sky
7    Example8  purple
8    Example9    blue
9   Example10  orange
10  Example11     sky
11  Example12   green
12  Example13   green
13  Example14  yellow
14  Example15     red
15  Example16    pink
16  Example17    pink
17  Example18   green
18  Example19     sky

然后，您可以进入带有列颜色的更大数据框：

pd.pivot_table(df.assign(v=1), index='Example', columns='Color', values='v')

Color      blue  green  orange  pink  purple  red  sky  yellow
Example                                                       
Example1    NaN    NaN     NaN   NaN     1.0  NaN  NaN     NaN
Example10   NaN    NaN     1.0   NaN     NaN  NaN  NaN     NaN
Example11   NaN    NaN     NaN   NaN     NaN  NaN  1.0     NaN
Example12   NaN    1.0     NaN   NaN     NaN  NaN  NaN     NaN
Example13   NaN    1.0     NaN   NaN     NaN  NaN  NaN     NaN
Example14   NaN    NaN     NaN   NaN     NaN  NaN  NaN     1.0
Example15   NaN    NaN     NaN   NaN     NaN  1.0  NaN     NaN
Example16   NaN    NaN     NaN   1.0     NaN  NaN  NaN     NaN
Example17   NaN    NaN     NaN   1.0     NaN  NaN  NaN     NaN
Example18   NaN    1.0     NaN   NaN     NaN  NaN  NaN     NaN
Example19   NaN    NaN     NaN   NaN     NaN  NaN  1.0     NaN
Example2    1.0    NaN     NaN   NaN     NaN  NaN  NaN     NaN
Example3    NaN    NaN     1.0   NaN     NaN  NaN  NaN     NaN
Example4    NaN    NaN     NaN   NaN     NaN  NaN  NaN     1.0
Example5    NaN    NaN     NaN   NaN     NaN  1.0  NaN     NaN
Example6    NaN    NaN     NaN   1.0     NaN  NaN  NaN     NaN
Example7    NaN    NaN     NaN   NaN     NaN  NaN  1.0     NaN
Example8    NaN    NaN     NaN   NaN     1.0  NaN  NaN     NaN
Example9    1.0    NaN     NaN   NaN     NaN  NaN  NaN     NaN

整个代码：

import pandas as pd
import re

flat_list = lambda l: [item for sublist in l for item in sublist]
flat = flat_list(my_list)

splitout = [(x.strip().split(' ')[0].strip(), x.strip().split(' ')[1]) for x in pd.Series(flat)]

df = pd.DataFrame({})
df['Example'] = [x[0] for x in splitout]
df['Color'] = [re.sub('[/(/)]', '', x[1]) for x in splitout]

pivot = pd.pivot_table(df.assign(v=1), index='Example', columns='Color', values='v')

Answer 2

如果您不想保留内部列表，则可以通过双重列表理解来实现：

[item for inner in my_list for item in inner if 'yellow' in item]

产量：

['Example4（黄色）'，'Example14（黄色）']

如果要保留内部列表，可以这样：

[ [item for item in inner if 'yellow' in item] for inner in my_list ]

产量：

[[''Example4（黄色）']，['Example14（黄色）']

在python中过滤嵌套列表

2 个答案: