Pandas read_html() converts numbers in table to string

时间:2015-09-14 16:01:03

标签: python html pandas

I have an html table that I want to convert to a pandas dataframe. I'm using pandas.read_html() and it works alright except it reads in the numbers in my table as strings.

table = html_table_here
table = '<table border="1" class="dataframe">\n  '+table+'    \n</table>'
df_table = pandas.read_html(table,header=None,index_col=0)
df = pandas.concat(df_table)

And doing print df["TASK_ID"][0] returns "3" instead of 3.

Is there anyway to preserve the type of the values in an html table when converting to a pandas dataframe?

1 个答案:

答案 0 :(得分:0)

You have to explicitly convert those strings to numerical dtypes as it can't tell what the dtypes are when they're str in html, so calling convert_objects should work:

df_table = df_table.convert_objects(convert_numeric=True)