Question

我正在尝试阅读一些人口数据作为学习熊猫的练习：

>>> countries = pd.read_table('country_data.txt',
                             thousands=',',
                             header=None,
                             names=["Country Name", "Area (km^2)", "Areami2",
                                    "Population", "Densitykm2", "Densitymi2",
                                    "Date", "Source"],
                             usecols=["Country Name", "Area (km^2)", "Population"],
                             index_col="Country Name"
                             )
>>> countries.head()

给出

                Area (km^2) Population
Country Name        
Monaco             2     36,136
Singapore        716     5,399,200
Vatican City     0.44    800
Bahrain          757     1,234,571
Malta            315     416,055

即使我指定了千位='，'，它看起来像群体被读为字符串：

>>> countries.ix["Singapore"]["Population"]
'5,399,200'

我已经尝试在read_table调用中移动“千位='，'”位，并检查数据以查看是否有东西被搞砸了，但那里只有数值，我不知道知道在哪里看......

Answer 1

这是a bug in 0.12，已修复（即将发布）0.13。

在此之前，我建议手动修改列：

In [11]: df['Population'].str.replace(',', '').astype(int)  # or float
Out[11]: 
0      36136
1    5399200
2        800
3    1234571
4     416055
Name: Population, dtype: int64

In [12]: df['Population'] = df['Population'].str.replace(',', '').astype(int)

Pandas read_table（）千位='，'不工作

1 个答案: