Question

我有以下代码：

url = 'https://www.basketball-reference.com/leagues/NBA_2017_standings.html#all_expanded_standings'
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')

print(len(soup.findAll('table')))
print(soup.findAll('table'))

网页上有6个表，但只返回4个表。我试图使用＆＃39; html.parser＆＃39;或者＆＃39; html5lib＆＃39;作为解析器，但也没有工作。

知道如何获得表格＆＃34;扩大排名＆＃34;来自网页？

谢谢！

Answer 1

requests无法获取由JS加载的数据。所以，你必须使用selenium。首先通过selenium - pip安装pip install selenium并下载chrome driver并将该文件放入您的工作目录。然后尝试以下代码。

from bs4 import BeautifulSoup
import time
from selenium import webdriver

url = "https://www.basketball-reference.com/leagues/NBA_2017_standings.html"
browser = webdriver.Chrome()

browser.get(url)
time.sleep(3)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")

print(len(soup.find_all("table")))
print(soup.find("table", {"id": "expanded_standings"}))

browser.close()
browser.quit()

请参阅selenium documentation。

如果您在Linux并且收到错误Chromedriver executable needs to be in the PATH，请尝试按照以下方式进行操作：link-1，link-2

美丽的汤取动态表数据

1 个答案: