Question

我想使用urllib和BeautifulSoup从网站上抓取一些特定数据。我试图获取文本“190.0公斤”。我已尝试过，因为您可以在我的代码中看到使用attrs={'class': 'col-md-7'} 但这会返回错误的结果。有没有办法指定我希望它在<h3>？

之间返回文本

from urllib.request import urlopen
from bs4 import BeautifulSoup

# specify the url
quote_page = 'https://styrkeloft.no/live.styrkeloft.no/v2/?test-stevne'

# query the website and return the html to the variable 'page'    
page = urlopen(quote_page)

# parse the html using beautiful soup     
soup = BeautifulSoup(page, 'html.parser')

# take out the <div> of name and get its value    
Weight_box = soup.find('div', attrs={'class': 'col-md-7'})

name = name_box.text.strip() 
print (name)

Answer 1

由于此内容是动态生成的，因此无法使用requests模块访问该数据。

您可以使用selenium webdriver来完成此任务：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

chrome_options = Options()
chrome_options.add_argument("--headless")

chrome_driver = "path_to_chromedriver"

driver = webdriver.Chrome(chrome_options=chrome_options,executable_path=chrome_driver)
driver.get('https://styrkeloft.no/live.styrkeloft.no/v2/?test-stevne')
html = driver.page_source
soup = BeautifulSoup(html, "lxml")
current_lifter = soup.find("div", {"id":"current_lifter"})
value = current_lifter.find_all("div", {'class':'row'})[2].find_all("h3")[0].text
driver.quit()

print(value)

确保机器中有chromedriver个可执行文件。

Python BeautifulSoup soup.find

1 个答案: