我有这个脚本来抓取网站并找到我需要的项目..
from socket import timeout
from urllib.request import Request, urlopen, URLError
import bs4,urllib.parse
def track(self):
for _object in _objects:
req = Request('http://example.com/item.php?id='+str(_object))
req.add_header('User-Agent',
'Mozilla 5.0')
_URL = urlopen(req).read()
soup = bs4.BeautifulSoup(_URL, "html.parser")
allResults = []
i = 1
for hit in soup.findAll('cite'):
if ("% Off" in hit.text):
allResults.append(str(i) + ". " + hit.text + " | Item => " + _object)
i += 1
if (len(allResults) == 0):
print("No result found for this item => " + _object)
else:
for element in allResults:
print(element)
我想抛出异常,所以当网站连接失败时,或者由于任何其他原因它无法访问网址时,它会打印出“#34;发生了错误的事情"
我知道我必须使用socket.timeout但是我应该把它放在代码中?
答案 0 :(得分:1)
将urlopen调用包装成try:except call:
try:
_URL = urlopen(req).read()
except Exception as e:
print("Something happened wrong: {}".format(e))
# do something, eg: continue