我有一个URL列表,我需要测试它们是否存在(大约8000-9000,每个都在它自己的行上)。
我尝试做的是自动执行向每个网址发送请求的过程,如果是真正的网址,请将其附加到文件中,如果连接到网页时出错或出现错误URL,而不是将其添加到文件中,这样我就有一个只有正版URL的文件。
我尝试使用python-requests库在Python中执行此操作,但我的代码不起作用。我做错了什么?
import requests
with open("/origin/file.txt") as urls:
lines = urls.readlines()
for i in lines:
r = requests.get(i)
try:
r
except NameError:
print "Unsuccessful in connecting, discarding URL..."
else:
print "Successful connection! Adding URL to file..."
with open("/destination/file.txt", "a") as genuine
print i + "\n"
del r
urls.close
genuine.close
答案 0 :(得分:0)
关于您正在做的事情的奇怪之处在于,您并未将requests.get(someurl)
块中最有可能出错try/except
的部分包裹起来。相反,您正在包装响应对象(如果它存在,则意味着requests.get
是某种成功的):
for i in lines:
r = requests.get(i)
try:
r
except NameError:
print "Unsuccessful in connecting, discarding URL..."
else:
print "Successful connection! Adding URL to file..."
with open("/destination/file.txt", "a") as genuine
print i + "\n"
del r
我会建议更多这样的东西:
import requests
from requests.exceptions import Timeout, ConnectionError, MissingSchema
...
for url in lines:
try:
resp = requests.get(url)
if resp.ok:
# now you can add the url to the file
except (Timeout, ConnectionError, MissingSchema):
pass