Question

我必须解决这个问题：目标：删除大部分缺少行的列输入： 1.数据框df：熊猫数据框 2. threshold：确定要删除的列。如果阈值是.9，则会删除缺少90％值的列输出： 1.带有删除列的数据框df（如果未删除任何列，则将返回相同的数据框）

Excel Doc Screenshot

我已经对此进行了编码：

class variableTreatment():

    def drop_nan_col(self, df, threshold): 

        self.threshold = threshold
        self.df = df
        for i in df.columns:
            if (float(df[i].isnull().sum())/df[i].shape[0]) > threshold:
                df = df.drop(i)

我必须具有“自我，博士和门槛”，并且不能添加更多。该代码必须通过以下测试用例：

import pandas as pd
import numpy as np
df = pd.read_excel('CKD.xlsx')

VT = variableTreatment()

VT

VT.drop_nan_col(df, 0.9).head()

当我运行VT.drop_nan_col（df，0.9）.head（）时，无法更改此代码行，我得到：

KeyError: "['yls'] not found in axis"

如果我将形状更改为0而不是1，我认为这与我所做的不正确，我得到：

IndexError: tuple index out of range

有人可以帮助我了解如何解决此问题吗？

Answer 1

我认为您需要从

进行更改

<Button xmlns:p="clr-namespace:Parameter.Model" Content="{Binding Source={x:Static p:parBez.Val1}}"/>

到

df = df.drop(i)

因此，您选择的是列而不是行，这是默认选项。看到这里同样的错误https://stackoverflow.com/a/44931865/5184851

此外，要使用df = df.drop(i, axis=1)函数.head()需要返回数据帧，即drop_nan_col(...)

熊猫：基于阈值条件删除列

1 个答案: