Question

我正在播放的页面是此https://web.archive.org/web/*/https://cd.lianjia.com/，我想进入此Webarchive在不同时间点保存的页面，如日历中的点所示，但是在视图页面源中找不到任何href链接在不同的时间点。如果单击一个时间点上的检查，则可以看到href链接。我尝试安装selenium和gekodriver

import re
from bs4 import BeautifulSoup
import requests
import urllib.request
import urllib
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

binary = FirefoxBinary('/usr/bin/firefox')
browser = webdriver.Firefox(firefox_binary=binary)
browser.get('https://web.archive.org/web/*/https://cd.lianjia.com/')
page = BeautifulSoup(browser.page_source, 'html.parser')

for a in page.find_all('a', href=True):
    print ("Found the URL:", a['href'])

由此，那些超链接仍然没有出现。然后，我不得不遵循有关硒的教程，但仅仅是第一步，它已经显示出错误并让我发疯。

from seleniumcrawler import handle_url

ModuleNotFoundError：没有名为“ seleniumcrawler”的模块我可以确定我的硒已经安装好了，因为uper代码可以运行了，在pip列表中也可以看到它enter image description here，请帮帮我，非常感谢。

如何使用硒刮板超链接

0 个答案: