Question

我有一个熊猫数据框，其中“ 流派 ”列中有多个以” |“ 分隔的值。我在下面放了一张图片。

包含电影详细信息的数据框：

如果我使用split函数，它将被转换为无法哈希的列表。

现在，我只想在“ 流派” 中包含单词“动作” 的情况下选择数据框的行？我该怎么办？

先谢谢了。

Answer 1

这是使用set的一种解决方案：

df = pd.DataFrame({'genres': ['A|B|C|D', 'A|B|C', 'B|D']})

res = df[df['genres'].str.split('|').apply(set) >= {'D'}]

print(res)

    genres
0  A|B|C|D
2      B|D

这自然可以扩展到多种类型：

res = df[df['genres'].str.split('|').apply(set) >= {'A', 'B'}]

print(res)

    genres
0  A|B|C|D
1    A|B|C

Answer 2

您可以使用此：

df = df[df['genres'].str.contains("Action")]

示例：

df = {'genres' : ('Action', 'crime', 'Action|crime', 'Romance|Action', 'Comedy'),'runtime' : (1,3,5,6,7)}
df = pd.DataFrame(df)

输出：

           genres  runtime
0          Action        1
2    Action|crime        5
3  Romance|Action        6

当数据框的列包含多个值时选择数据框的行

2 个答案: