在我拥有的数据中,某些功能值为?
。如何用NA
替换它们?
修改
代码和输出如下:
df = pd.read_csv("cca-census-income.csv", header = None)
df.replace('?', np.nan, inplace=True)
df.ix[0,]
23 Other relative of householder
24 1700.09
25 ?
26 ?
27 ?
28 Not in universe under 1 year old
29 ?
30 0
答案 0 :(得分:3)
将参数na_values='?'
添加到read_csv
。
样品:
import pandas as pd
import io
temp=u"""Date Time,a
2010-01-27 16:00:00,?
2010-01-27 16:10:00,2.2
2010-01-27 16:30:00,1.7"""
df = pd.read_csv(io.StringIO(temp),na_values='?')
print (df)
Date Time a
0 2010-01-27 16:00:00 NaN
1 2010-01-27 16:10:00 2.2
2 2010-01-27 16:30:00 1.7
编辑:
感谢您'shivsn'建议添加skipinitialspace=True
:
temp=u"""Date Time,a
? , ?
? ,?
2010-01-27 16:30:00,1.7"""
df = pd.read_csv(io.StringIO(temp),na_values=['?', '? '], skipinitialspace =True)
print (df)
Date Time a
0 NaN NaN
1 NaN NaN
2 2010-01-27 16:30:00 1.7
EDIT1 by file:
看起来space
之前只有?
:
df = pd.read_csv('census-income.data',
header = None,
na_values=['?'],
skipinitialspace =True)
print (df)
答案 1 :(得分:1)
阅读文件后使用replace:
df.repalce('.?',np.nan,inplace=True,regex=True)