pandas.read_html仅将特定列转换为float

时间:2019-11-23 12:56:27

标签: python pandas type-conversion

我正在尝试创建一个程序,该程序可以读取站点中的表并将表中的仅一部分列转换为浮点数。

此站点表如下:

Account   Responsible     Grade
1.0.0     João Da Silva   3,5
1.1.0     Antônio Pereira 2,5
1.2.0     Maria do Céu    4,5
1.2.1     Joana Antunes   5,0

为此,我使用了BeautifulSoup和pandas.read_html如下

import BeautifulSoup as bs
import pandas as pd
############################################################
# This part of the code was voided to simplify my question #
############################################################
soup = bs(page_source,'html.parser')
table = soup.find('table',{'id': 'table_id'})
data = pd.read_html(str(table), encoding = 'utf-8', decimal=",", thousands='.')[0]

当我这样做时,表格会根据需要进行转换,除了“帐户”列外,返回的熊猫数据框如下所示:

Index   1       2               3
0       Account Responsible     Grade
1       100     João Da Silva   3.5
2       110     Antônio Pereira 2.5
3       120     Maria do Céu    4.5
4       121     Joana Antunes   5.0

我的想法是保持“ Account”列值与原始表中的值相同,以避免任何误转换并按原样转换其他表值(在此示例中,应为[str,str,float])

Index   1         2               3
0       Account   Responsible     Grade
1       1.0.0     João Da Silva   3.5
2       1.1.0     Antônio Pereira 2.5
3       1.2.0     Maria do Céu    4.5
4       1.2.1     Joana Antunes   5.0

是否可以执行这种o转换?

感谢从现在开始的一切可能的支持和最诚挚的问候。

2 个答案:

答案 0 :(得分:0)

您可以尝试为该列设置转换器。

data = pd.read_html(str(table), encoding = 'utf-8', decimal=",", thousands='.', converters={'Account': str})[0]

答案 1 :(得分:0)

This帮助:

data = pd.read_html(str(table), encoding = 'utf-8', thousands="ª", decimal="ª")[0]
data['Grade'] = data['Grade'].apply(lambda x: float(x.replace(',', '.')))

this

data = pd.read_html(str(table), encoding = 'utf-8', thousands=None)[0]
data['Grade'] = data['Grade'].apply(lambda x: float(x.replace(',', '.')))