我正在尝试从网站上抓取标题,但它仅返回1个标题。如何获得所有标题?
以下是我试图使用xpath(以-开始)获取的元素之一:
<div id="post-4550574" class="post-box " data-permalink="https://hypebeast.com/2019/4/undercover-nike-sfb-mountain-sneaker-release-info" data-title="The UNDERCOVER x Nike SFB Mountain Pack Gets a Release Date"><div class="post-box-image-container fixed-ratio-3-2">
这是我当前的代码:
from selenium import webdriver
import requests
from bs4 import BeautifulSoup as bs
driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
driver.get('https://hypebeast.com/search?s=nike+undercover')
element = driver.find_element_by_xpath(".//*[starts-with(@id, 'post-')]")
print(element.get_attribute('data-title'))
输出:
The UNDERCOVER x Nike SFB Mountain Pack Gets a Release Date
我期待更多的冠军,但只返回一个结果。
答案 0 :(得分:1)
要从website中提取产品标题,因为所需元素是JavaScript启用的元素,您需要为{{生成 WebDriverWait 1}},则可以使用以下任何Locator Strategies:
visibility_of_all_elements_located()
:
XPATH
driver.get('https://hypebeast.com/search?s=nike+undercover')
print([element.text for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2/span")))])
:
CSS_SELECTOR
控制台输出:
driver.get('https://hypebeast.com/search?s=nike+undercover')
print([element.text for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h2>span")))])
答案 1 :(得分:1)
您不需要硒。您可以使用速度更快的requests
,并定位data-title
属性
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://hypebeast.com/search?s=nike+undercover')
soup = bs(r.content, 'lxml')
titles = [item['data-title'] for item in soup.select('[data-title]')]
print(titles)
如果您确实希望硒匹配语法是
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://hypebeast.com/search?s=nike+undercover')
titles = [item.get_attribute('data-title') for item in driver.find_elements_by_css_selector('[data-title]')]
print(titles)
答案 2 :(得分:0)
如果定位器找到多个元素,则find_elemnt
返回第一个元素。 find_elements
返回定位器找到的所有元素的列表。
然后,您可以迭代列表并获取所有元素。
如果您要查找的所有元素都具有类post-box
,则可以按类名找到这些元素。
答案 3 :(得分:0)
只是分享我的经验和我使用过的东西,可能会对某人有所帮助。就用吧,
element.get_attribute('ATTRIBUTE-NAME')