Question

我使用BeautifulSoup已有一段时间了，但是我并没有太大的问题。但是现在我正试图从一个给我带来一些问题的网站上进行抓取。我的代码是这样的：

    preSoup = requests.get('https://www.betbrain.com/football/world/')
    print(currUrl)
    soup = BeautifulSoup(preSoup.content,"lxml")
    print(soup)

我得到的内容似乎是它们所连接的某种脚本和/或api，而不是我在浏览器中看到的网页的真实内容。例如，我无法参加比赛。有谁知道解决方法吗？谢谢

Answer 1

好的请求仅获取html而不加载js 您必须使用webdriver 您可以使用Chrome，Firefox等。我使用PhantomJS，因为其“无头”浏览器在后台运行。在下面，您会找到一些示例代码，这些代码将帮助您了解如何使用它

from bs4 import BeautifulSoup
import time
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("https://www.betbrain.com/football/world/")
time.sleep(5)# you can give it some time to load the js 
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
for i in  soup.findAll("span", {"class": "Participant1"}):
    print (i.text)

BeautifulSoup-无法获取页面内容

1 个答案: