Question

运行此脚本时出错：

import urllib.request
import urllib.parse
from bs4 import BeautifulSoup

url = "http://nytimes.com,http://nytimes.com"

urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls

while len(urls) >0:
try:
    htmltext = urllib.request.urlopen(urls[0]).read()
except:
    print(htmltext)

原始scipt：

import urllib.request
import urllib.parse
from bs4 import BeautifulSoup

url = "http://nytimes.com,http://nytimes.com"

urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls

while len(urls) >0:
try:
    htmltext = urllib.request.urlopen(urls[0]).read()
except:
    print(urls[0])
soup = BeautifulSoup(htmltext)

urls.pop(0)

print (soup.findAll('a',href=True))

错误：

socket.gaierror：[Errno -2]名称或服务未知

urllib.error.URLError：urlopen错误[Errno -2]名称或服务未知

追踪（最近一次呼叫最后一次）：

NameError：name＆＃39; htmltext＆＃39;未定义

Answer 1

如果urllib.request.urlopen()引发异常，则永远不会为htmltext分配值（因此在except中打印该值将无效）。

至于urlopen()无效的原因，请确保传递的是有效网址。

NameError：未定义名称'htmltext'

1 个答案: