Question

我目前正在处理100列以上的数据集，在这100列中，前四列为我提供了基本信息，例如标签，描述，目标，部门。除了其他四列之外，请提供我的数据值。对于那些必填信息，其中某些值的数据值为空。我想删除所有数据均为空的所有行。

所以，基本上我做了什么。我做了很长的路要走。首先，我将整个表分成两个表。 df1存储了我的基本信息（标签，描述，目标，部门），而df2存储了我的数据值。现在对于df2，我使用了isull（）方法，并找出哪个索引为我提供了空值。我记下了索引，并整理了两个表。总结后，我基本上根据我记下的索引删除了行。

df1 = pd.read_excel('***.xlsx',skiprows = 5)

df2 = df1.iloc[:,4:]

df2[df2.isnull().all(axis=1)] (*Used this to note down the index of null value rows*)

df1.drop(df1.iloc[:,4:],axis=1,inplace = True) (*Used this to get rid of the data value columns and only leave behind the essential information columns*)

new_df = pd.concat([df1,df2],axis = 1)

new_df.drop(new_df.index[[430,431,432]],inplace = True)

以下方法确实可以做到。但是，我感觉到它的路很长，所以我想知道是否有更短的方法？非常感谢您的帮助

Answer 1

如果我理解正确，那么您正在寻找dropna：

df1.dropna(how='all', subset=df1.columns[4:])

这指定您应该只删除从第四列开始具有所有空值的行。

编辑：由于您实际上想删除所有值均为0的行，因此应该这样做：

df1 = df1[~(df1.iloc[:, 4:] == 0).all(axis=1)]

我想知道，哪一组特定的列具有Null值

1 个答案: