Question

我想通过数据框中的多列过滤出包含特定值的行。

E.g

    code tag number floor  note
1   1111  *   **     34     no 
2   2323  7   899     7     no
3   3677  #   900    11     no
4   9897  10  134    *      no
5    #    #   566    11     no
6   3677  55  908    11     no

我想在列代码，标记，数字，地板中过滤掉包含＃，*，**的所有行。

我想要的是

    code tag number floor  note
1   1111  *   **     34     no 
3   3677  #   900    11     no
4   9897  10  134    *      no
5    #    #   566    11     no

我试图在数据框中使用isin方法，但它确实适用于一列，但不能在多列中使用。谢谢！

Answer 1

我认为您需要使用布尔索引apply，isin和any：

list = ['#','*','**']
cols = ['code','tag','number','floor']
df[df[cols].apply(lambda x: x.isin(list).any(), axis=1)]

输出：

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Answer 2

您也可以使用df.applymap：

s = {'*', '**', '#'}
df[df.applymap(lambda x: x in s).max(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

piR suggested一个疯狂的（但它有效！）替代方案：

df[df.apply(set, 1) & {'*', '**', '#'}]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Answer 3

选项1
假设没有其他预先存在的pir

df[df.replace(['#', '*', '**'], 'pir').eq('pir').any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

选项2
令人讨厌的numpy广播。一开始很快但是按比例缩放

df[(df.values[None, :] == np.array(['*', '**', '#'])[:, None, None]).any(0).any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

选项3
不那么讨厌np.in1d

df[np.in1d(df.values, ['*', '**', '#']).reshape(df.shape).any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

选项4
在map

的顶部

df[list(
    map(bool,
        map({'*', '**', '#'}.intersection,
            map(set,
                zip(*(df[c].values.tolist() for c in df)))))
)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

过滤列

3 个答案: