我正在尝试阅读一些人口数据作为学习熊猫的练习:
>>> countries = pd.read_table('country_data.txt',
thousands=',',
header=None,
names=["Country Name", "Area (km^2)", "Areami2",
"Population", "Densitykm2", "Densitymi2",
"Date", "Source"],
usecols=["Country Name", "Area (km^2)", "Population"],
index_col="Country Name"
)
>>> countries.head()
给出
Area (km^2) Population
Country Name
Monaco 2 36,136
Singapore 716 5,399,200
Vatican City 0.44 800
Bahrain 757 1,234,571
Malta 315 416,055
即使我指定了千位=',',它看起来像群体被读为字符串:
>>> countries.ix["Singapore"]["Population"]
'5,399,200'
我已经尝试在read_table调用中移动“千位=','”位,并检查数据以查看是否有东西被搞砸了,但那里只有数值,我不知道知道在哪里看......
答案 0 :(得分:3)
这是a bug in 0.12,已修复(即将发布)0.13。
在此之前,我建议手动修改列:
In [11]: df['Population'].str.replace(',', '').astype(int) # or float
Out[11]:
0 36136
1 5399200
2 800
3 1234571
4 416055
Name: Population, dtype: int64
In [12]: df['Population'] = df['Population'].str.replace(',', '').astype(int)