丢弃大部分为0的pandas DF行

时间:2018-02-02 10:14:07

标签: python python-3.x pandas data-cleaning

我有一个如下所示的数据集:

enter image description here

想要放下像4,5&因为大多数列具有0但不是全部。同时,我不想删除0和1之类的行,因为它们只有很少的条目为0。

3 个答案:

答案 0 :(得分:0)

首先创建一个列来计算行中的零

df['no_of_zeros']=(df == 0).astype(int).sum(axis=1)

定义行中可接受的零数,并根据它过滤数据帧。

df=df[df['no_of_zeros'] < 3].drop(['no_of_zeros'], axis=1)

答案 1 :(得分:0)

这是一种方式:

import pandas as pd

df = pd.DataFrame([[0, 1, 2, 3, 4],
                   [0, 0, 0, 1, 2]],
                  columns=['A', 'B', 'C', 'D', 'E'])

df = df[~((df == 0).astype(int).sum(axis=1) > len(df.columns) / 2)]

#    A  B  C  D  E
# 0  0  1  2  3  4

答案 2 :(得分:0)

假设“多数”意味着“超过一半的列”,这有效:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'c2': {0: 76, 1: 45, 2: 47, 3: 92, 4: 0, 5: 0, 6: 26, 7: 0, 8: 71},
   ...:  'c3': {0: 0, 1: 3, 2: 6, 3: 9, 4: 0, 5: 0, 6: 12, 7: 0, 8: 15},
   ...:  'c4': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
   ...:  'c5': {0: 23, 1: 0, 2: 23, 3: 23, 4: 0, 5: 0, 6: 23, 7: 0, 8: 23},
   ...:  'c6': {0: 65, 1: 25, 2: 62, 3: 26, 4: 52, 5: 22, 6: 65, 7: 0, 8: 69},
   ...:  'c7': {0: 12, 1: 12, 2: 12, 3: 12, 4: 12, 5: 12, 6: 12, 7: 12, 8: 12},
   ...:  'c8': {0: 0, 1: 0, 2: 8, 3: 9, 4: 0, 5: 0, 6: 4, 7: 0, 8: 4},
   ...:  'cl': {0: 5, 1: 7, 2: 8, 3: 15, 4: 0, 5: 0, 6: 2, 7: 0, 8: 5},
   ...:  'sr': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8}})
   ...:  

In [3]: df
Out[3]: 
   c2  c3  c4  c5  c6  c7  c8  cl  sr
0  76   0   1  23  65  12   0   5   0
1  45   3   1   0  25  12   0   7   1
2  47   6   1  23  62  12   8   8   2
3  92   9   1  23  26  12   9  15   3
4   0   0   1   0  52  12   0   0   4
5   0   0   1   0  22  12   0   0   5
6  26  12   1  23  65  12   4   2   6
7   0   0   1   0   0  12   0   0   7
8  71  15   1  23  69  12   4   5   8

In [4]: df[((df == 0).sum(axis=1) <= len(df.columns) / 2)]
Out[4]: 
   c2  c3  c4  c5  c6  c7  c8  cl  sr
0  76   0   1  23  65  12   0   5   0
1  45   3   1   0  25  12   0   7   1
2  47   6   1  23  62  12   8   8   2
3  92   9   1  23  26  12   9  15   3
6  26  12   1  23  65  12   4   2   6
8  71  15   1  23  69  12   4   5   8

In [5]: