我正在尝试抓取中级帖子和内容。一切都很好,代码也可以运行,并打开浏览器,直接指向指定的URL。但是在输出屏幕上,它应该显示帖子名称,内容,作者姓名和其他打印内容。
所有的类名也是正确的。 然后我以为可能是因为动态内容永无止境,但是我将限制设置为变量输出,但仍然没有显示输出。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
option = webdriver.ChromeOptions()
browser = webdriver.Chrome(executable_path=r"C:/Users/Jai
Sipani/Downloads/chrome_driver/chromedriver.exe",
chrome_options=option)
browser.get("https://medium.com/topic/startups")
# Wait 60 seconds for page to load
timeout = 60
try:
WebDriverWait(browser,
timeout).until(EC.visibility_of_element_located((By.XPATH,
"//img[@class='n dx dy dz ea ed y']")))
except TimeoutException:
print("Timed out waiting for page to load")
browser.quit()
titles_heading = browser.find_elements_by_class_name("ar aj da bc db bd
em gb gc at aw eo dg dh av")
titles_heading = titles_heading[:10]
titles = [x.text for x in titles_heading]
print('titles:')
print(titles, '\n')
titles_desc = browser.find_element_by_class_name("bh bi bc b bd be bf bg
at aw dj dg dh av ef ep")
titles_desc = titles_desc[:10]
desc = [i.text for i in titles_desc]
print('desc:')
print(desc, '\n')
authors = browser.find_element_by_class_name("bc b bd be bf bg at aw dj
dg dh av ar aj")
authors = authors[:10]
author = [x.text for x in authors]
print('author: ')
print(author, '\n')
timeline = browser.find_element_by_class_name("fg ae fh")
timeline = timeline[:10]
time = [x.text for x in timeline]
print('time: ')
print(time, '\n')
for title, desc, author, time in zip(titles, titles_desc, authors,
timeline):
print("Title : title_Desc : authors : timeline")
print(title + ": " + desc + ": "+ author + ": " + time, '\n')
我希望输出的是印刷文章和内容的列表,但没有得到。该脚本可以完美地在60秒内关闭会话。