我正在尝试创建一个脚本,该脚本从txt文件发出对随机网址的请求
import urllib2
with open('urls.txt') as urls:
for url in urls:
try:
r = urllib2.urlopen(url)
except urllib2.URLError as e:
r = e
if r.code in (200, 401):
print '[{}]: '.format(url), "Up!"
elif r.code == 404:
print '[{}]: '.format(url), "Not Found!"
但我想要的是当某个网址没有找到404从文件中删除时。每个网址都是每行,所以基本上是擦除404找不到的网址。怎么做?!
答案 0 :(得分:1)
你可以写第二个文件:
import urllib2
with open('urls.txt', 'r') as urls, open('urls2.txt', 'w') as urls2:
for url in urls:
try:
r = urllib2.urlopen(url)
except urllib2.URLError as e:
r = e
if r.code in (200, 401):
print '[{}]: '.format(url), "Up!"
urls2.write(url + '\n')
elif r.code == 404:
print '[{}]: '.format(url), "Not Found!"
答案 1 :(得分:0)
要从文件中删除行,您必须重写文件的整个内容。最安全的方法是在同一目录中写出 new 文件,然后在旧文件上rename
。我会像这样修改你的代码:
import os
import sys
import tempfile
import urllib2
good_urls = set()
with open('urls.txt') as urls:
for url in urls:
try:
r = urllib2.urlopen(url)
except urllib2.URLError as e:
r = e
if r.code in (200, 401):
sys.stdout.write('[{}]: Up!\n'.format(url))
good_urls.add(url)
elif r.code == 404:
sys.stdout.write('[{}]: Not found!\n'.format(url))
else:
sys.stdout.write('[{}]: Unexpected response code {}\n'.format(url, r.code))
tmp = None
try:
tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.txt', dir='.', delete=False)
for url in sorted(good_urls):
tmp.write(url + "\n")
tmp.close()
os.rename(tmp.name, 'urls.txt')
tmp = None
finally:
if tmp is not None:
os.unlink(tmp.name)
您可能希望在第一个循环中向good_urls.add(url)
子句添加else
。如果有人知道一个比较简单的方法来做我做的尝试 - 最后在那里,我想听听它。