Pandas read_html将逗号解释为日期而不是整数

时间:2014-05-04 19:39:56

标签: python pandas

从下面的代码可以看出,第3,第8,第9和第10列被误解为日期时间对象。第1列,第6列和第7列应为整数。如何强制将列解释为正确的类型?只有2,4,5和11似乎已正确读取。我可以传递'infer_types = False'我想以后会做手动转换。

In [63]: import pandas as pd
In [64]: path = r"http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"
In [65]: tables = pd.read_html(path)
In [66]: df = tables[1]

In [67]: df.head()
Out[67]:
        1           2          3         4         5        6        7   8   \
1  !000001  California        NaT  37253956  33871648  !000053  !000055 NaT
2  !000002       Texas        NaT  25145561  20851820  !000036  !000038 NaT
3  !000003    New York 1965-11-27  19378102  18976457  !000027  !000029 NaT
4  !000004     Florida        NaT  18801310  15982378  !000027  !000029 NaT
5  !000005    Illinois        NaT  12830632  12419293  !000018  !000020 NaT

   9   10      11
1 NaT NaT  11.91%
2 NaT NaT   8.04%
3 NaT NaT   6.19%
4 NaT NaT   6.01%
5 NaT NaT   4.10%

[5 rows x 11 columns]

dtype: object

In [68]: df.dtypes
Out[68]:
1             object
2             object
3     datetime64[ns]
4             object
5             object
6             object
7             object
8     datetime64[ns]
9     datetime64[ns]
10    datetime64[ns]
11            object
dtype: object

0 个答案:

没有答案