Pandas read_table()千位=','不工作

时间:2013-11-17 23:39:54

标签: python pandas

我正在尝试阅读一些人口数据作为学习熊猫的练习:

>>> countries = pd.read_table('country_data.txt',
                             thousands=',',
                             header=None,
                             names=["Country Name", "Area (km^2)", "Areami2",
                                    "Population", "Densitykm2", "Densitymi2",
                                    "Date", "Source"],
                             usecols=["Country Name", "Area (km^2)", "Population"],
                             index_col="Country Name"
                             )
>>> countries.head()

给出

                Area (km^2) Population
Country Name        
Monaco             2     36,136
Singapore        716     5,399,200
Vatican City     0.44    800
Bahrain          757     1,234,571
Malta            315     416,055

即使我指定了千位=',',它看起来像群体被读为字符串:

>>> countries.ix["Singapore"]["Population"]
'5,399,200'

我已经尝试在read_table调用中移动“千位=','”位,并检查数据以查看是否有东西被搞砸了,但那里只有数值,我不知道知道在哪里看......

1 个答案:

答案 0 :(得分:3)

这是a bug in 0.12,已修复(即将发布)0.13。

在此之前,我建议手动修改列:

In [11]: df['Population'].str.replace(',', '').astype(int)  # or float
Out[11]: 
0      36136
1    5399200
2        800
3    1234571
4     416055
Name: Population, dtype: int64

In [12]: df['Population'] = df['Population'].str.replace(',', '').astype(int)