我正在尝试创建一个程序,该程序可以读取站点中的表并将表中的仅一部分列转换为浮点数。
此站点表如下:
Account Responsible Grade
1.0.0 João Da Silva 3,5
1.1.0 Antônio Pereira 2,5
1.2.0 Maria do Céu 4,5
1.2.1 Joana Antunes 5,0
为此,我使用了BeautifulSoup和pandas.read_html如下
import BeautifulSoup as bs
import pandas as pd
############################################################
# This part of the code was voided to simplify my question #
############################################################
soup = bs(page_source,'html.parser')
table = soup.find('table',{'id': 'table_id'})
data = pd.read_html(str(table), encoding = 'utf-8', decimal=",", thousands='.')[0]
当我这样做时,表格会根据需要进行转换,除了“帐户”列外,返回的熊猫数据框如下所示:
Index 1 2 3
0 Account Responsible Grade
1 100 João Da Silva 3.5
2 110 Antônio Pereira 2.5
3 120 Maria do Céu 4.5
4 121 Joana Antunes 5.0
我的想法是保持“ Account”列值与原始表中的值相同,以避免任何误转换并按原样转换其他表值(在此示例中,应为[str,str,float])
Index 1 2 3
0 Account Responsible Grade
1 1.0.0 João Da Silva 3.5
2 1.1.0 Antônio Pereira 2.5
3 1.2.0 Maria do Céu 4.5
4 1.2.1 Joana Antunes 5.0
是否可以执行这种o转换?
感谢从现在开始的一切可能的支持和最诚挚的问候。
答案 0 :(得分:0)
您可以尝试为该列设置转换器。
data = pd.read_html(str(table), encoding = 'utf-8', decimal=",", thousands='.', converters={'Account': str})[0]
答案 1 :(得分:0)
This帮助:
data = pd.read_html(str(table), encoding = 'utf-8', thousands="ª", decimal="ª")[0]
data['Grade'] = data['Grade'].apply(lambda x: float(x.replace(',', '.')))
或this:
data = pd.read_html(str(table), encoding = 'utf-8', thousands=None)[0]
data['Grade'] = data['Grade'].apply(lambda x: float(x.replace(',', '.')))