Question

假设我有一个数据框df为：

df = pd.DataFrame({'Index': [1, 2, 3, 4, 5],
                   'Name': ['A', 'B', 100, 'C', 'D'],
                   'col1': [np.nan, 'bbby', 'cccy', 'dddy', 'EEEEE'],
                   'col2': ['water', np.nan, 'WATER', 'soil', 'cold air'],
                   'col3': ['watermelone', 'hot AIR', 'air conditioner', 'drink', 50000],
                  'Results': [1000, 2000, 3000, 4000, 5000]})


Out

Index  Name  col1     col2         col3           Results
    1  A     NaN    water       watermelone        1000
    2  B     bbbY    NaN         hot AIR           2000
    3  100   cccY    water       air conditioner   3000
    4  C     dddf    soil        drink             4000
    5  D     EEEEE   cold air    50000             5000

我有一个列表：matches = ['wat','air']

如何选择col1中包含col2的{{1}}或col3或i的所有行。

预期输出：

matches

Answer 1

您可以使用.T来转置数据帧，并使用str.contains逐列检查值，然后转回（如果str.contains可以将多个值传递给{{ 1}}，这就是为什么我使用|将列表更改为字符串的原因。

转置数据框的好处在于，您可以使用逐列的pandas方法，而不是遍历行或较长的matches = '|'.join(matches)列表理解。 lambda x:与具有This technique should have good performance答案的lambda x相比：

axis=1

Answer 2

也尝试一下：

df = df[df['col1'].str.contains('|'.join(matches))|df['col2'].str.contains('|'.join(matches))|df['col3'].str.contains('|'.join(matches))]

打印：

  Name   col1      col2             col3
1    A   aadY     water      watermelone
2    B   bbbY       air          hot AIR
3    B   cccY     water  air conditioner
5    D  EEEEE  cold air              eat

如果数据帧的任何列中包含子字符串列表中的任何值，则过滤行

2 个答案: