Question

我一般只是webscraping和python的初学者，所以很遗憾如果答案很明显，但是我不知道自己无法在https://www.hockey-reference.com/leagues/NHL_2018.html上找到任何表元素

我最初的想法是这是整个div都被注释掉的结果，因此在我在另一篇类似的帖子中找到的一些建议之后，我替换了注释字符并确认在保存汤时将其删除。文本到文本文件并搜索。我仍然找不到任何表格。

在尝试进一步搜索时，我从.find中取出了ID，并执行了一次findAll，但表仍然为空。

这是我要使用的代码，非常感谢任何建议！

import csv
import requests
from BeautifulSoup import BeautifulSoup
import re

comm = re.compile("<!--|-->")

url = 'https://www.hockey-reference.com/leagues/NHL_2018.html'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(comm.sub("", html))
table = soup.find('table', id="stats")

搜索我正在使用的所有表元素时

table = soup.findAll('table')

我也知道网站上有一个csv版本，我只是很想练习。

Answer 1

提供一个解析器以及您的标记，例如BeautifulSoup(html,'lxml')。试试下面的代码

url = 'https://www.hockey-reference.com/leagues/NHL_2018.html'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html,'lxml')
table = soup.findAll('table')

Soup.find和findAll无法在hockey-reference.com

1 个答案: