Question

我编写了一个python脚本，使用套接字（Checking network connection）检查互联网连接，然后使用硒从yahoo Finance中抓取了html。

非常频繁（但并非总是如此），它会出现ReadTimeoutError（见下文）

我可以改为使用http.client检查互联网连接来使其工作（请参阅下文），但是我仍然想知道为什么套接字会干扰硒。


def internet(host="8.8.8.8", port=443, timeout=1):
    try:
        socket.setdefaulttimeout(timeout)
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.connect((host, port))
        s.shutdown(socket.SHUT_RDWR)
        s.close()
        return True
    except OSError:  
        s.close()
        return False

#  Wait for internet to be available

i = 1
while internet() is False:
    time.sleep(1)
    if i == 300:  # quit if no connection for 5 min (300 seconds)
        print('\nIt has been 5 minutes. Aborting attempt.\n')
        sys.exit(0)
    i += 1

# Get html from yahoo page

symb = 'AAPL'
url = 'http://finance.yahoo.com/quote/{}/history'.format(symb)

chop = webdriver.ChromeOptions()
chop.add_argument('--user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0"')
driver = webdriver.Chrome('/Users/fake_user/Dropbox/Python/chromedriver', chrome_options=chop)
driver.get(url)
html_source = driver.page_source
driver.quit()

它抛出此错误：

urllib3.exceptions.ReadTimeoutError： HTTPConnectionPool（host ='127.0.0.1'，port = 58956）：读取超时。（读取超时= <位于0x103af7140处的对象对象）

我可以更改Internet功能作为一种解决方法，但是我不知道为什么套接字会干扰硒：

import http.client as httplib

def internet():
    conn = httplib.HTTPConnection("www.google.com", timeout=5)
    try:
        conn.request("HEAD", "/")
        conn.close()
        return True
    except:
        conn.close()
        return False

Answer 1

来自documentation：

socket.setdefaulttimeout（timeout）

为新的套接字对象设置默认超时（以秒为单位）（浮动）。首次导入套接字模块时，默认值为“无”。有关可能的值及其各自的含义，请参见settimeout（）。

问题是setdefaulttimeout为所有新创建的套接字设置了超时，因此也为Selenium设置了超时。这是一个全局套接字库设置。

如果仅想对该套接字实例使用超时，请使用socket.settimeout(value)（doc）。

def internet(host="8.8.8.8", port=443, timeout=1):
try:
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.timeout(timeout)
    s.connect((host, port))
    s.shutdown(socket.SHUT_RDWR)
    s.close()
    return True
except OSError:  
    s.close()
    return False

为什么插座会干扰硒？

1 个答案:

socket.setdefaulttimeout（timeout）