我已经用python在python中编写了一个脚本,以从网页中刮除与每个item name
相关的不同评论者。单击see more
按钮时,很少有项目显示审阅者,而很少有没有审阅者。
我试图以这种方式编写脚本,以便它将从着陆页获取所有项目链接,然后滚动每个链接,然后单击review tab
,然后单击{{1 }}按钮,最后收集评论者并重复同样的操作,直到没有剩余的项目为止。
这里的主要问题是,当脚本单击see more
按钮时,它将打开一个包含审阅者的新标签。
Link to one of such item containing reviews
Link to the page containing full reviews
这是我到目前为止的尝试:
see more
我上面的脚本可以从第一个包含评论的可用项目中收集from urllib.parse import urljoin
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://eatstreet.com/madison-wi/restaurants"
def get_information(driver,link):
driver.get(link)
#collecting all the links connected to item names
itemlinks = [urljoin(url,item.get_attribute("href")) for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.restaurant-header")))]
for itemlink in itemlinks:
driver.get(itemlink)
#check whether there is any review
revitem = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"label[for='reviews']")))
if revitem and (revitem.text != "Reviews (0)"):
current = driver.current_window_handle
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"label[for='reviews']"))).click()
wait.until(EC.visibility_of_element_located((By.LINK_TEXT,'See More Reviews'))).click()
wait.until(EC.new_window_is_opened)
driver.switch_to.window([window for window in driver.window_handles if window != current][0])
while True:
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,'ul.reviews div.review .review-sidebar #dropdown_user-name'))):
print(item.text)
try:
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pagination-block a.next"))).click()
wait.until(EC.staleness_of(item))
except Exception:break
driver.switch_to.default_content()
if __name__ == '__main__':
options = Options()
options.add_argument("--disable-notifications")
driver = webdriver.Chrome(chrome_options=options)
wait = WebDriverWait(driver,10)
try:
get_information(driver,url)
finally:
driver.quit()
的名称,但是当本应用于下一个项目以收集评论者的名称时,它会引发reviewers
错误。发生这种情况的原因可能是,当脚本timeout exception
尝试重复执行该操作时,未选中新打开的选项卡。
下图显示了如何显示“查看更多”按钮:
答案 0 :(得分:2)
如果您需要关闭新窗口并返回到初始窗口,请尝试替换
driver.switch_to.default_content()
使用
driver.close()
driver.switch_to.window(current)