Python BeautifulSoup soup.find

时间:2018-03-14 16:48:47

标签: python python-3.x beautifulsoup urllib

我想使用urllib和BeautifulSoup从网站上抓取一些特定数据。 我试图获取文本“190.0公斤”。我已尝试过,因为您可以在我的代码中看到使用attrs={'class': 'col-md-7'}  但这会返回错误的结果。有没有办法指定我希望它在<h3>

之间返回文本

enter image description here

from urllib.request import urlopen
from bs4 import BeautifulSoup

# specify the url
quote_page = 'https://styrkeloft.no/live.styrkeloft.no/v2/?test-stevne'

# query the website and return the html to the variable 'page'    
page = urlopen(quote_page)

# parse the html using beautiful soup     
soup = BeautifulSoup(page, 'html.parser')

# take out the <div> of name and get its value    
Weight_box = soup.find('div', attrs={'class': 'col-md-7'})

name = name_box.text.strip() 
print (name)

1 个答案:

答案 0 :(得分:0)

由于此内容是动态生成的,因此无法使用requests模块访问该数据。

您可以使用selenium webdriver来完成此任务:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

chrome_options = Options()
chrome_options.add_argument("--headless")

chrome_driver = "path_to_chromedriver"

driver = webdriver.Chrome(chrome_options=chrome_options,executable_path=chrome_driver)
driver.get('https://styrkeloft.no/live.styrkeloft.no/v2/?test-stevne')
html = driver.page_source
soup = BeautifulSoup(html, "lxml")
current_lifter = soup.find("div", {"id":"current_lifter"})
value = current_lifter.find_all("div", {'class':'row'})[2].find_all("h3")[0].text
driver.quit()

print(value)

确保机器中有chromedriver个可执行文件。