为什么我会为此Python脚本获取连接拒绝异常?

时间:2017-04-26 09:23:43

标签: python web-scraping beautifulsoup python-requests urllib

我正在编写一个Python脚本,使用请求模块从azlyrics中获取歌曲的歌词。这是我写的脚本:

import requests, re
from bs4 import BeautifulSoup as bs
url = "http://search.azlyrics.com/search.php"
payload = {'q' : 'shape of you'}
r = requests.get(url, params = payload)
soup = bs(r.text,"html.parser")
try:
    link = soup.find('a', {'href':re.compile('http://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html')})['href']
    link = link.replace('http', 'https')
    print(link)
    raw_data = requests.get(link)
except Exception as e: 
    print(e)

但我得到一个例外说明:

Max retries exceeded with url: /lyrics/edsheeran/shapeofyou.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fbda00b37f0>: Failed to establish a new connection: [Errno 111] Connection refused',))

我在互联网上看到我可能正在尝试发送太多请求。所以我让脚本睡了一段时间:

import requests, re
from bs4 import BeautifulSoup as bs
from time import sleep
url = "http://search.azlyrics.com/search.php"
payload = {'q' : 'shape of you'}
r = requests.get(url, params = payload)
soup = bs(r.text,"html.parser")
try:
    link = soup.find('a', {'href':re.compile('http://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html')})['href']
    link = link.replace('http', 'https')
    sleep(60)
    print(link)
    raw_data = requests.get(link)
except Exception as e: 
    print(e)

但没有运气!

所以我尝试使用urllib.request

import requests, re
from bs4 import BeautifulSoup as bs
from time import sleep
from urllib.request import urlopen
url = "http://search.azlyrics.com/search.php"
payload = {'q' : 'shape of you'}
r = requests.get(url, params = payload)
soup = bs(r.text,"html.parser")
try:
    link = soup.find('a', {'href':re.compile('http://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html')})['href']
    link = link.replace('http', 'https')
    sleep(60)
    print(link)
    raw_data = urlopen(link).read()
except Exception as e: 
    print(e)

然后得到了不同的例外陈述:

<urlopen error [Errno 111] Connection refused>

任何人都可以告诉我它有什么问题吗?我该如何解决?

1 个答案:

答案 0 :(得分:0)

在网络浏览器中试用;当您尝试访问http://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html时,它会正常工作,但当您尝试访问https://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html时,它将无法正常工作。

请删除link = link.replace('http', 'https')行,然后重试。