如何将(成功的)URL连接添加到文件

时间:2014-10-22 01:31:55

标签: python python-requests

我有一个URL列表,我需要测试它们是否存在(大约8000-9000,每个都在它自己的行上)。

我尝试做的是自动执行向每个网址发送请求的过程,如果是真正的网址,请将其附加到文件中,如果连接到网页时出错或出现错误URL,而不是将其添加到文件中,这样我就有一个只有正版URL的文件。

我尝试使用python-requests库在Python中执行此操作,但我的代码不起作用。我做错了什么?

import requests

with open("/origin/file.txt") as urls:
    lines = urls.readlines()

for i in lines:
    r = requests.get(i)
    try:
        r
    except NameError:
        print "Unsuccessful in connecting, discarding URL..."
    else:
        print "Successful connection! Adding URL to file..."
        with open("/destination/file.txt", "a") as genuine
            print i + "\n"
    del r

urls.close
genuine.close

1 个答案:

答案 0 :(得分:0)

关于您正在做的事情的奇怪之处在于,您并未将requests.get(someurl)块中最有可能出错try/except的部分包裹起来。相反,您正在包装响应对象(如果它存在,则意味着requests.get是某种成功的):

for i in lines:
    r = requests.get(i)
    try:
        r
    except NameError:
        print "Unsuccessful in connecting, discarding URL..."
    else:
        print "Successful connection! Adding URL to file..."
        with open("/destination/file.txt", "a") as genuine
            print i + "\n"
    del r

我会建议更多这样的东西:

import requests
from requests.exceptions import Timeout, ConnectionError, MissingSchema

...
    for url in lines:
        try:
            resp = requests.get(url)
            if resp.ok:
                # now you can add the url to the file
        except (Timeout, ConnectionError, MissingSchema):
            pass