Question

我正在尝试使用Pandas clean来处理非常大的数据帧。

数据集包含重复的列，用于度量身高，体重，性别和年龄。一些行具有列名currentAge的数据，而其他行则具有列名currentAge2的数据。

因此，我想删除NaN和currentAge中都具有currentAge2的行，例如，因为它们是无用的数据点。我想对所有其他指标执行相同的操作。

数据框的索引从0开始。下面是我尝试过的代码。

for index, row in csv.iterrows():
    if ((math.isnan(row['currentAge']) and math.isnan(row['currentAge2'])) == True):
        csv.drop(csv.index[index])

这不起作用，当我在place = True中使用时，出现索引超出范围错误。如果有人能阐明我如何正确清理此数据框，那将很棒。 csv是我的数据框的名称。

Answer 1

我认为我们不需要iterrows。

csv[~(csv['currentAge'].isnull())&(csv['currentAge2'].isnull())]

Answer 2

如果您要在currentAge和currentAge2中都放置NaN行，也可以尝试：

csv.dropna(how='all', subset=['currentAge','currentAge2'], inplace=True)

docs解释了how和subset杂项的工作方式。如果您需要考虑更多列，则使用起来也更容易。

我希望有帮助。

在保留索引的同时删除包含NaN的行

2 个答案: