Question

我是beautifulsoup的新手，我在篮球参考上使用它时遇到了麻烦。我试图将高级统计数据的整个数据帧存储到pandas数据帧中，但我甚至无法选择它。到目前为止，这是我的代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd


url='http://www.basketball-reference.com/teams/ATL/2016.html'
html = urlopen(url)
soup = BeautifulSoup(html)

soup.findAll('table',attrs={'id': 'advanced'})

从上面的代码中选择高级后，我看到了我需要的html，但我实际上无法解析和提取数据。

Answer 1

找到table元素并让read_html()执行解析和数据框初始化作业：

table = soup.find('table', attrs={'id': 'advanced'})

df = pd.read_html(str(table))
print(df)  # prints a dataframe with 15 rows x 27 columns

WebSscpping与BeautifulSoup

1 个答案: