运行此脚本时出错:
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = "http://nytimes.com,http://nytimes.com"
urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls
while len(urls) >0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
except:
print(htmltext)
原始scipt:
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = "http://nytimes.com,http://nytimes.com"
urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls
while len(urls) >0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
except:
print(urls[0])
soup = BeautifulSoup(htmltext)
urls.pop(0)
print (soup.findAll('a',href=True))
错误:
socket.gaierror:[Errno -2]名称或服务未知
urllib.error.URLError:urlopen错误[Errno -2]名称或服务未知
追踪(最近一次呼叫最后一次):
NameError:name' htmltext'未定义
答案 0 :(得分:2)
如果urllib.request.urlopen()
引发异常,则永远不会为htmltext
分配值(因此在except
中打印该值将无效)。
至于urlopen()
无效的原因,请确保传递的是有效网址。