requests.exceptions.MissingSchema:无效的网址“无”:尝试通过Selenium和Python查找断开的链接时,未提供任何模式

时间:2019-01-23 11:08:44

标签: python-3.x selenium selenium-webdriver request python-requests

我想使用Selenium + Python在我的网页上找到损坏的链接。我尝试了上面的代码,但它显示了以下错误:

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

代码试用:

for link in links:

    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

Snapshot of full code

2 个答案:

答案 0 :(得分:1)

此错误消息...

    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

...表示在收集的 href 属性内,对Unicode域名和路径的支持失败。

此错误在models.py中的定义如下:

    # Support for unicode domain names and paths.
    scheme, auth, host, port, path, query, fragment = parse_url(url)
    if not scheme:
        raise MissingSchema("Invalid URL {0!r}: No schema supplied. "
                            "Perhaps you meant http://{0}?".format(url))

解决方案

一旦搜索结果可用于Google Home Page Search Box上的关键字,则可能正在尝试查找断开的链接。为此,您可以使用以下解决方案:

  • 代码块:

    import requests
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.keys import Keys 
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get('https://google.co.in/')
    search = driver.find_element_by_name('q')
    search.send_keys("selenium")
    search.send_keys(Keys.RETURN)
    links = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//div[@class='rc']//h3//ancestor::a[1]")))
    print("Number of links : %s" %len(links))
    for link in links:
        r = requests.head(link.get_attribute('href'))
        print(link.get_attribute('href'), r.status_code)
    
  • 控制台输出:

    Number of links : 9
    https://www.seleniumhq.org/ 200
    https://www.seleniumhq.org/download/ 200
    https://www.seleniumhq.org/docs/01_introducing_selenium.jsp 200
    https://www.guru99.com/selenium-tutorial.html 200
    https://en.wikipedia.org/wiki/Selenium_(software) 200
    https://github.com/SeleniumHQ 200
    https://www.edureka.co/blog/what-is-selenium/ 200
    https://seleniumhq.github.io/selenium/docs/api/py/ 200
    https://seleniumhq.github.io/docs/ 200
    

更新

根据您的反问,从 Selenium 角度规范地回答 xpath 为什么有效但 tagName 无效的原因有些困难。也许您可能希望对这些讨论进行更深入的研究:

答案 1 :(得分:0)

尝试一下,我很确定会有更好的方法来完成此操作,这可能会解决您的问题,也可能无法解决您的问题。我希望在岸上的时间里,我想出了这种方法,它似乎对我有用。

<setting key="MaxLoginAttempts" value="3" />
<setting key="BeginDate" value="18-3-2019" />
<setting key="UserName" value="hello" />

输出

<setting key="MaxLoginAttempts" value="3" type="int"/>
<setting key="BeginDate" value="18-3-2019" type="datetime"/>
<setting key="UserName" value="hello" type="string"/>