可能的解决方案

Question

我正在尝试按列值过滤数据框，但我不明白。假设我有以下数据框：

Index Column1 Column2
1      path1   ['red']
2      path2   ['red' 'blue']
3      path3   ['blue']

我的数据框具有确切的格式。我想创建一个子数据帧，其中的行仅包含['red']中的Column2。那只是第一行。

到目前为止，我尝试过的其他方法是：

classes = ['red']
df=df.loc[df['Column2'].isin(classes)]

但是它不起作用。我收到此警告，只是保持不变：

FutureWarning：逐元素比较失败；而是返回标量，但将来将执行元素比较 f = lambda x，y：htable.ismember_object（x，values）

如何正确完成？谢谢。

编辑：我认为我对自己的解释不是很好。

我的数据，例如['red' 'blue']，中间没有逗号。是类型“对象”。我想以这种方式过滤原始数据帧，它显示列'Column2'包含例如red的行。在这种情况下，它将显示行1和2。有可能吗？

Answer 1

一种可能的解决方案是比较set，优势是按长度> 1的顺序排序并不重要：

import ast
df['Column2'] = df['Column2'].str.replace(' ', ', ').apply(ast.literal_eval)

替代：

df['Column2'] = df['Column2'].fillna("''").str.findall(r"'(.+?)'")

classes = ['red']
df1 = df[~df.Column2.map(set(classes).isdisjoint)]
print (df1)

0      1   path1        [red]
1      2   path2  [red, blue]

Answer 2

再现后的数据帧完全相同：

   Index Column1         Column2
0      1   path1         ['red']
1      2   path2  ['red' 'blue']
2      3   path3        ['blue']

数据框：

可能的解决方案

您可以尝试通过替换]，'和df['Column2'] = df['Column2'].str.replace('[', '') df['Column2'] = df['Column2'].str.replace(']', '') df['Column2'] = df['Column2'].str.replace('\'', '')来做到这一点：

classes = ['red']
df = df[df.Column2.str.contains(''.join(classes))]

现在做：

   Index Column1   Column2
0      1   path1       red
1      2   path2  red blue

输出：

years = [10, 20, 30, 40]
years.each do |year|
  "In #{year} years you will be: #{age + year}"
end

我无法按列值过滤数据框

2 个答案:

可能的解决方案