request.exceptions.MissingSchema:无效的网址“”“

时间:2019-11-12 11:39:44

标签: python python-3.x for-loop request

即使链接中包含https :,也会出现缺少模式错误

我正在尝试使用Python抓取多个Wiki页面,我在excel中有一个Wiki URL列表。

并创建了一个Python类,用于抓取Wiki页面并通过for循环运行它。在没有for循环的情况下运行代码时,我可以获得输出,但是当我在for循环中包含以下代码时,则会得到缺少的模式。

import re
from bs4 import BeautifulSoup
import requests
import xlrd

wb = xlrd.open_workbook('list.xls')
sheet = wb.sheet_by_index(0)

class wiki:

    def __init__(self,url):
        #self.name =name
        self.url = url
        cont = requests.get(self.url, timeout=5)
        soup = BeautifulSoup(cont.content, "html.parser")

    def urlcont (self):
        cont = requests.get(self.url, timeout=5)
        soup = BeautifulSoup(cont.content, "html.parser")
        print (soup.prettify())
    def head(self):
        cont = requests.get(self.url, timeout=5)
        soup = BeautifulSoup(cont.content, "html.parser")
        title = soup.find(class_='firstHeading').i.text 
        return title

for i in range (sheet.nrows):
    url = sheet.cell_value(i,2)
    print (url)

    data = wiki(url)
    head = data.head()
    print (head)


运行此代码后出错

Traceback (most recent call last):
  File "D:\PYTHON\1click\final\alex.py", line 177, in <module>
    movie = wikimovie(movieurl)
  File "D:\PYTHON\1click\final\alex.py", line 69, in __init__
    cont = requests.get(self.url, timeout=5)
  File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 519, in request
    prep = self.prepare_request(req)
  File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 452, in prepare_request
    p.prepare(
  File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 313, in prepare
    self.prepare_url(url, params)
  File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 387, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?

移除For循环时的输出

https://en.wikipedia.org/wiki/######

######

用于输出所有URL(带有for循环)而不调用类的输出

https://en.wikipedia.org/wiki/######
https://en.wikipedia.org/wiki/######
https://en.wikipedia.org/wiki/######

for循环时我得到的输出将被忽略,并将var i替换为“ url = sheet.cell_value(i,2)”这一行中的任何其他随机值

0 个答案:

没有答案