Question

我想从google.com抓取网络结果。我遵循了这个问题的第一个答案Google Search Web Scraping with Python。不幸的是我遇到连接错误。我也碰巧与其他网站进行了检查，其未连接。是因为公司代理设置吗？

请注意，我正在使用虚拟环境“ Webscrapping”。

from urllib.parse import urlencode, urlparse, parse_qs

from lxml.html import fromstring
from requests import get

raw = get("https://www.google.com/search?q=StackOverflow").text
page = fromstring(raw)

for result in page.cssselect(".r a"):
    url = result.get("href")
    if url.startswith("/url?"):
        url = parse_qs(urlparse(url).query)['q']
    print(url[0])

raw = get（“ https://www.google.com/search?q=StackOverflow”）。text   追溯（最近一次通话）：

文件“”，第1行，在       raw = get（“ https://www.google.com/search?q=StackOverflow”）。text

文件   “ c：\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ api.py”，   第75行，进入       返回请求（'get'，url，params = params，** kwargs）

文件   “ c：\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ api.py”，   60行，应要求       返回session.request（method = method，url = url，** kwargs）

文件   “ c：\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ sessions.py”，   524行，在请求中       resp = self.send（prep，** send_kwargs）

文件   “ c：\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ sessions.py”，   发送中的第637行       r = adapter.send（request，** kwargs）

文件   “ c：\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ adapters.py”，   发送中的第516行       引发ConnectionError（e，request = request）

ConnectionError：HTTPSConnectionPool（主机='www.google.com'，端口= 443）：   网址超过了最大重试次数：/ search？q = StackOverflow（由引起   NewConnectionError（'：无法建立新的连接：   [WinError 10060]连接尝试失败，因为已连接   一段时间后未正确响应，或已建立   连接失败，因为连接的主机无法响应'））

请告知。谢谢

编辑：我尝试固定google.com，但失败。

import os
hostname = "https://www.google.com" #example
response = os.system("ping -c 1 " + hostname)

#and then check the response...
if response == 0:
  print(hostname, 'is up!')
else:
  print(hostname, 'is down!')

https://www.google.com掉线了！

Answer 1

我认为您是由于代理设置而收到此错误的。尝试在命令提示符下运行以下命令之一

set http_proxy=http://proxy_address:port
set http_proxy=http://user:password@proxy_address:port
set https_proxy=https://proxy_address:port
set https_proxy=https://user:password@proxy_address:port

使用Python3.7进行Web爬网：ConnectionError：HTTPSConnectionPool（host ='www.google.com'，port = 443）：

1 个答案: