我找到了许多滚动整个网页的引用,但我正在寻找一个特定的部分来滚动。我正在marketwatch.com上工作 - 部分 - 最新新闻标签。如何使用selenium webdriver滚动这个最新的新闻标签?
下面是我的代码,它返回新闻的标题,但不断重复相同的标题。
from bs4 import BeautifulSoup
import urllib
import csv
import time
from selenium import webdriver
count = 0
browser = webdriver.Chrome()
browser.get("https://www.marketwatch.com/newsviewer")
pageSource = browser.page_source
soup = BeautifulSoup(pageSource, 'lxml')
arkodiv = soup.find("ol", class_="viewport")
while browser.find_element_by_tag_name('ol'):
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(0.5)
div = list(arkodiv.find_all('div', class_= "nv-details"))
heading = []
Data_11 = list(soup.find_all("div", class_ = "nv-text-cont"))
datetime = list(arkodiv.find_all("li", timestamp = True))
for sa in datetime:
sh = sa.find("div", class_ = "nv-text-cont")
if sh.find("a", class_ = True):
di = sh.text.strip()
di = di.encode('ascii', 'ignore').decode('ascii')
else:
continue
print di
heading.append((di))
count = count+1
if 'End of Results' in arkodiv:
print 'end'
break
else:
continue
print count
答案 0 :(得分:2)
这是因为您正在执行的脚本滚动到页面底部。
要继续滚动元素获取新闻,您需要替换它:
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
用这个:
browser.execute_script("document.documentElement.getElementsByClassName('viewport')[0].scrollTop = 999999")
修改强>
这是完整的解决方案:
from bs4 import BeautifulSoup
import urllib
import csv
import time
from selenium import webdriver
count = 0
browser = webdriver.Chrome()
browser.get("https://www.marketwatch.com/newsviewer")
while browser.find_element_by_tag_name('ol'):
pageSource = browser.page_source
soup = BeautifulSoup(pageSource, 'lxml')
arkodiv = soup.find("ol", class_="viewport")
browser.execute_script("document.documentElement.getElementsByClassName('viewport')[0].scrollTop = 999999")
time.sleep(0.5)
div = list(arkodiv.find_all('div', class_= "nv-details"))
heading = set()
Data_11 = list(soup.find_all("div", class_ = "nv-text-cont"))
datetime = list(arkodiv.find_all("li", timestamp = True))
for sa in datetime:
sh = sa.find("div", class_ = "nv-text-cont")
if sh.find("a", class_ = True):
di = sh.text.strip()
di = di.encode('ascii', 'ignore').decode('ascii')
else:
continue
print di
heading.add((di))
count = count+1
if 'End of Results' in arkodiv:
print 'end'
break
else:
continue
print count
编辑2
您可能还想更改标题的存储方式,因为您当前的方式会在列表中保留重复项。将其更改为set
,以免发生。