即使链接中包含https :,也会出现缺少模式错误
我正在尝试使用Python抓取多个Wiki页面,我在excel中有一个Wiki URL列表。
并创建了一个Python类,用于抓取Wiki页面并通过for循环运行它。在没有for循环的情况下运行代码时,我可以获得输出,但是当我在for循环中包含以下代码时,则会得到缺少的模式。
import re
from bs4 import BeautifulSoup
import requests
import xlrd
wb = xlrd.open_workbook('list.xls')
sheet = wb.sheet_by_index(0)
class wiki:
def __init__(self,url):
#self.name =name
self.url = url
cont = requests.get(self.url, timeout=5)
soup = BeautifulSoup(cont.content, "html.parser")
def urlcont (self):
cont = requests.get(self.url, timeout=5)
soup = BeautifulSoup(cont.content, "html.parser")
print (soup.prettify())
def head(self):
cont = requests.get(self.url, timeout=5)
soup = BeautifulSoup(cont.content, "html.parser")
title = soup.find(class_='firstHeading').i.text
return title
for i in range (sheet.nrows):
url = sheet.cell_value(i,2)
print (url)
data = wiki(url)
head = data.head()
print (head)
运行此代码后出错
Traceback (most recent call last):
File "D:\PYTHON\1click\final\alex.py", line 177, in <module>
movie = wikimovie(movieurl)
File "D:\PYTHON\1click\final\alex.py", line 69, in __init__
cont = requests.get(self.url, timeout=5)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 519, in request
prep = self.prepare_request(req)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 452, in prepare_request
p.prepare(
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 313, in prepare
self.prepare_url(url, params)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 387, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?
移除For循环时的输出
https://en.wikipedia.org/wiki/######
######
用于输出所有URL(带有for循环)而不调用类的输出
https://en.wikipedia.org/wiki/######
https://en.wikipedia.org/wiki/######
https://en.wikipedia.org/wiki/######
for循环时我得到的输出将被忽略,并将var i替换为“ url = sheet.cell_value(i,2)”这一行中的任何其他随机值