删除python pandas中的NaN值

时间:2013-11-18 17:05:59

标签: python csv pandas

数据来自人口普查数据的成年人收入,行数如下:

31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K
48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K

我正在尝试从pandas中的CSV文件加载的DataFrame中删除所有带NaN的行。

>>> import pandas as pd
>>> income = pd.read_csv('income.data')
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object)
>>> income.dropna(how='any') # should drop all rows with NaNs
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object)
    Self-emp-inc, nan], dtype=object) # what??
>>> income = income.dropna(how='any') # ok, maybe reassignment will work?
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object) # what??

我尝试使用较小的example.csv

label,age,sex
1,43,M
-1,NaN,F
1,65,NaN

并且dropna()在这里对分类和数字NaN都很好。到底是怎么回事?我是Pandas的新手,只是在学习绳索。

2 个答案:

答案 0 :(得分:6)

正如我在评论中所写:“NaN”有一个领先的空白(至少在你提供的数据中)。因此,您需要在na_values函数中指定read_csv参数。

试试这个:

df = pd.read_csv("income.csv",header=None,na_values=" NaN")

这就是你的第二个例子有效的原因,因为这里没有前导空格。

答案 1 :(得分:-1)

导入熊猫和numpy

读取您正在使用的 csv 文件

import pandas as pd
import numpy as np

data = pd.read_csv("data.csv")
data = data.replace('',np.nan)
data = data.dropna(axis="columns", how="any")

# dispaly the first 9 rows

data.head(10)