通过检查列表元素是否包含值来选择熊猫

时间：2020-11-02 18:50:47

标签： python pandas

我在pandas数据框中有一列与行中的列表相对应：

                                                   tags  contestId
20              [graphs, greedy, shortest paths, trees]       1437
27                       [binary search, combinatorics]       1436
64    [constructive algorithms, data structures, gre...       1426
81    [binary search, math, number theory, two point...       1423
111   [binary search, brute force, constructive algo...       1419
...                                                 ...        ...
6444                                             [math]         11
6449                               [dp, implementation]         10
6464                                   [implementation]          7
6486                          [hashing, implementation]          2
6488                             [implementation, math]          1

如何选择标记列表中具有“数学”或“树”的所有记录？

1 个答案:

答案 0 :(得分：0)

一种快速而肮脏的解决方案：

ans = df[df["tags"].apply(lambda el: "math" in el or "trees" in el)]

输出

print(ans)
   index                                             tags  contestId
0     20          [graphs, greedy, shortest paths, trees]       1437
3     81  [binary search, math, number theory, two point]       1423
5   6444                                           [math]         11
9   6488                           [implementation, math]          1

测试数据

# in.txt

index                                              tags  contestId
20              [graphs, greedy, shortest paths, trees]       1437
27                       [binary search, combinatorics]       1436
64      [constructive algorithms, data structures, gre]       1426
81      [binary search, math, number theory, two point]       1423
111     [binary search, brute force, constructive algo]       1419
6444                                             [math]         11
6449                               [dp, implementation]         10
6464                                   [implementation]          7
6486                          [hashing, implementation]          2
6488                             [implementation, math]          1

用于重构df的代码（请下次提供此类代码）：

df = pd.read_fwf("in.txt")
df["tags"] = df["tags"].apply(lambda s: s[1:-1].split(", "))

不幸的是，.isin()和.str.contains()似乎不起作用。