Question

假设我有一个df，其中某列的缺失值是50％。

我该如何删除相对于该列缺少值的10％的行？

基本上如何将列缺失值的百分比从50％降低到40％？

输入（缺少50％的值（6/12））：

输出（缺少40％的值（4/10））：我们删除了ID为8和10的最后2个NaN行

Answer 1

尝试一下：

# find  NaN entries in your df
nanEntries = df[pd.isnull(df)].index.tolist()
# choose 10% randomly
dropIndices = np.random.choice(nanEntries, size = int(df.shape[0]*0.1))
# drop them
df.drop(dropIndices)

Answer 2

要获取列中具有nan值的索引的数组，请使用：

nan_indices = df.index[df['your_column'].isna()]

要下降例如前20％，请使用：

df.drop(nan_indices[:int(len(nan_indices) * 0.2)])   #to create a new DataFrame, if you want to modify the original one, put inplace=True

如何删除列值为NaN的行的百分比

2 个答案: