Question

Python 3.6

我正在尝试抓取特定站点，并且如果我使用标头，则最终会因ConnectionError超时而失败，但是，如果我不使用标头，则不会出现任何错误。

据我所知，我使用的是正确的标题。

这是我的特定网址：

URLS = [r'https://www.nasdaq.com/symbol/AAPL/short-interest']

这是我尝试过的标头（我都尝试过）：

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}

headers2 = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0'}

以下是函数（一个带有标头，一个没有标头）：

def html_header(url):
     html = requests.get(url, headers = headers).text
     return html

def html_non_header(url):
     html = requests.get(url).text
     return html


if __name__ == '__main__':
     URLS = [r'https://www.nasdaq.com/symbol/AAPL/short-interest']
     html =html_header(URLS[0]) #times out
     html2 = html_non_header(URLS[0]) #works fine

为什么在使用标头时，我会因connectionError超时？

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

使用标头时Request.get因连接错误而超时

0 个答案: