我有一个从网站上删除表格的代码,并将其读入pandas Dataframe。但是,由于网站的设计方式,这是通过for
循环完成的。因此,表格都标有相同的name
即:它们标记在df
名称下
代码
soup = bs4.BeautifulSoup(driver.page_source, "html.parser")
for thead in soup.select(".data-point-container table thead"):
tbody = thead.find_next_sibling("tbody")
table = "<table>%s</table>" % (str(thead) + str(tbody))
df = pandas.read_html(str(table))[0]
print(df)
print('-------------')
结果
Table1 FY2012 FY2013 FY2014 FY2015 Last 12 Months
0 item1 value1 value2 value3 value4 value5
1 item2 value1 value2 value3 value4 value5
2 item3 value1 value2 value3 value4 value5
3 item4 value1 value2 value3 value4 value5
4 item5 value1 value2 value3 value4 value5
5 item6 value1 value2 value3 value4 value5
-------------
Table2 FY2012 FY2013 FY2014 FY2015 Last 12 Months
0 item1 value1 value2 value3 value4 value5
1 item2 value1 value2 value3 value4 value5
2 item3 value1 value2 value3 value4 value5
3 item4 value1 value2 value3 value4 value5
-------------
Table3 FY2012 FY2013 FY2014 FY2015 Last 12 Months
0 item1 value1 value2 value3 value4 value5
1 item2 value1 value2 value3 value4 value5
2 item3 value1 value2 value3 value4 value5
3 item4 value1 value2 value3 value4 value5
4 item5 value1 value2 value3 value4 value5
5 item6 value1 value2 value3 value4 value5
-------------
Table4 FY2012 FY2013 FY2014 FY2015 Last 12 Months
0 item1 value1 value2 value3 value4 value5
1 item2 value1 value2 value3 value4 value5
2 item3 value1 value2 value3 value4 value5
3 item4 value1 value2 value3 value4 value5
4 item5 value1 value2 value3 value4 value5
5 item6 value1 value2 value3 value4 value5
6 item7 value1 value2 value3 value4 value5
7 item8 value1 value2 value3 value4 value5
我有没有办法将所有Dataframe连接/合并到一个Dataframe中?
答案 0 :(得分:1)
如果您需要做的就是合并多个DataFrame,您只需在列表中收集它们,然后使用pd.concat合并它们。
这样的事情应该有效:
dataframes = []
for thread in soup.select(...):
#your scraper logic here
df = pandas.read_html(...)
dataframes.append(df)
pd.concat(dataframes)
这有帮助吗?