我想从这个网站的每一行中提取不同的标题:
https://trends.google.com/trends/trendingsearches/realtime?geo=AR&category=all
我尝试了一些没有运气的尝试。我认为通过按类搜索元素我会得到所需的文本:
from selenium import webdriver
driver=webdriver.Chrome('path to bin')
driver.get('https://trends.google.com/trends/trendingsearches/realtime?geo=AR&category=all')
hrefs = driver.find_elements_by_class_name('title')
print hrefs
print(len(hrefs))
driver.quit()
先谢谢你们! 琼
答案 0 :(得分:3)
你真是太近了!你只需要从标题中获取文本,试试这个:
from selenium import webdriver
driver=webdriver.Chrome('path to bin')
driver.get('https://trends.google.com/trends/trendingsearches/realtime?geo=AR&category=all')
Titles = driver.find_elements_by_class_name('title')
for title in Titles:
print(title.text)
driver.quit()
答案 1 :(得分:1)
@ PixelEinstein的答案将满足您的要求,非常完美。但作为最佳做法的一部分,您应始终最大化 浏览器窗口,并为引导 WebDriverWait 要显示的元素然后按如下方式提取文本:
代码块:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('https://trends.google.com/trends/trendingsearches/realtime?geo=AR&category=all')
titles = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='title']")))
for title in titles:
print(title.text)
driver.quit()
控制台输出:
Mauricio Macri • Cyst • Pancreas
Abortion • National Congress of Argentina • Debate
Abortion • Mayra Mendoza • Argentine Chamber of Deputies • Deputy