如何删除列值为NaN的行的百分比

时间:2019-02-11 11:32:43

标签: python pandas numpy dataframe

假设我有一个df,其中某列的缺失值是50%。

我该如何删除相对于该列缺少值的10%的行?

基本上如何将列缺失值的百分比从50%降低到40%?

输入(缺少50%的值(6/12)):

         0
    0  1.0
    1  1.0
    2  NaN
    3  NaN
    4  NaN
    5  1.0
    6  NaN
    7  1.0
    8  NaN
    9  1.0
   10  NaN
   11  1.0

输出(缺少40%的值(4/10)): 我们删除了ID为8和10的最后2个NaN行

         0
    0  1.0
    1  1.0
    2  NaN
    3  NaN
    4  NaN
    5  1.0
    6  NaN
    7  1.0
    9  1.0
   11  1.0

2 个答案:

答案 0 :(得分:0)

尝试一下:

# find  NaN entries in your df
nanEntries = df[pd.isnull(df)].index.tolist()
# choose 10% randomly
dropIndices = np.random.choice(nanEntries, size = int(df.shape[0]*0.1))
# drop them
df.drop(dropIndices)

答案 1 :(得分:0)

要获取列中具有nan值的索引的数组,请使用:

nan_indices = df.index[df['your_column'].isna()]

要下降例如前20%,请使用:

df.drop(nan_indices[:int(len(nan_indices) * 0.2)])   #to create a new DataFrame, if you want to modify the original one, put inplace=True