Question

我正在尝试使用BeautifulSoup从HTML表中提取数据并将其转换为带有列的n x 7数据框：日期，交易，清单编号，发货日期，付款类型，金额和预付余额。

到目前为止

我的代码片段：

def find_account_status(htmls):
soup = BeautifulSoup(htmls)
table = soup.find('table', border="0", cellpadding="2")
table2 = table.find_all("td", {"class": "bodytext"}, text=True)
print(table2.text.split())

以下是我试图提取的HTML代码片段：

Answer 1

您可以使用pandas.read_html()：

import pandas as pd

soup = BeautifulSoup(htmls)
table = soup.find('table', border="0", cellpadding="2")
df = pd.read_html(str(table))[0]

使用BeautifulSoup解析表

1 个答案: