我正在获取基于日期的html表,因此,如果我搜索20天,它将带给我20个表,并且我想将所有20个表添加到1个表中,以便可以验证时间序列内的数据。 我已经尝试过合并和添加熊猫函数,但是它只是作为字符串添加。
表一
[['\xa0', 'All Issues', 'Investment Grade', 'High Yield', 'Convertible'],
['Total Issues Traded', '8039', '5456', '2386', '197'],
['Advances', '3834', '2671', '1075', '88'],
['Declines', '3668', '2580', '994', '94'],
['Unchanged', '163', '54', '99', '10'],
['52 Week High', '305', '100', '193', '12'],
['52 Week Low', '152', '83', '63', '6'],
['Dollar Volume*', '27568', '17000', '9299', '1269']]
表二
[['\xa0', 'All Issues', 'Investment Grade', 'High Yield', 'Convertible'],
['Total Issues Traded', '8039', '5456', '2386', '197'],
['Advances', '3834', '2671', '1075', '88'],
['Declines', '3668', '2580', '994', '94'],
['Unchanged', '163', '54', '99', '10'],
['52 Week High', '305', '100', '193', '12'],
['52 Week Low', '152', '83', '63', '6'],
['Dollar Volume*', '27568', '17000', '9299', '1269']]
代码,但它添加为字符串。
tab_data = [[item.text for item in row_data.select("th,td")]
for row_data in tables.select("tr")]
df = pd.DataFrame(tab_data)
df2 = pd.DataFrame(tab_data)
df3 = df.add(df2,fill_value=0)
df
答案 0 :(得分:1)
答案 1 :(得分:1)
如果要将数字单元格转换为整数,则需要明确地执行以下操作:
tab_data = [[int(item.text) if item.text.isdigit() else item.text
for item in row_data.select("th,td")]
for row_data in tables.select("tr")]
希望有帮助。