从下面的代码可以看出,第3,第8,第9和第10列被误解为日期时间对象。第1列,第6列和第7列应为整数。如何强制将列解释为正确的类型?只有2,4,5和11似乎已正确读取。我可以传递'infer_types = False'我想以后会做手动转换。
In [63]: import pandas as pd
In [64]: path = r"http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"
In [65]: tables = pd.read_html(path)
In [66]: df = tables[1]
In [67]: df.head()
Out[67]:
1 2 3 4 5 6 7 8 \
1 !000001 California NaT 37253956 33871648 !000053 !000055 NaT
2 !000002 Texas NaT 25145561 20851820 !000036 !000038 NaT
3 !000003 New York 1965-11-27 19378102 18976457 !000027 !000029 NaT
4 !000004 Florida NaT 18801310 15982378 !000027 !000029 NaT
5 !000005 Illinois NaT 12830632 12419293 !000018 !000020 NaT
9 10 11
1 NaT NaT 11.91%
2 NaT NaT 8.04%
3 NaT NaT 6.19%
4 NaT NaT 6.01%
5 NaT NaT 4.10%
[5 rows x 11 columns]
dtype: object
In [68]: df.dtypes
Out[68]:
1 object
2 object
3 datetime64[ns]
4 object
5 object
6 object
7 object
8 datetime64[ns]
9 datetime64[ns]
10 datetime64[ns]
11 object
dtype: object