Question

我有一个URL列表，我需要测试它们是否存在（大约8000-9000，每个都在它自己的行上）。

我尝试做的是自动执行向每个网址发送请求的过程，如果是真正的网址，请将其附加到文件中，如果连接到网页时出错或出现错误URL，而不是将其添加到文件中，这样我就有一个只有正版URL的文件。

我尝试使用python-requests库在Python中执行此操作，但我的代码不起作用。我做错了什么？

import requests

with open("/origin/file.txt") as urls:
    lines = urls.readlines()

for i in lines:
    r = requests.get(i)
    try:
        r
    except NameError:
        print "Unsuccessful in connecting, discarding URL..."
    else:
        print "Successful connection! Adding URL to file..."
        with open("/destination/file.txt", "a") as genuine
            print i + "\n"
    del r

urls.close
genuine.close

Answer 1

关于您正在做的事情的奇怪之处在于，您并未将requests.get(someurl)块中最有可能出错try/except的部分包裹起来。相反，您正在包装响应对象（如果它存在，则意味着requests.get是某种成功的）：

for i in lines:
    r = requests.get(i)
    try:
        r
    except NameError:
        print "Unsuccessful in connecting, discarding URL..."
    else:
        print "Successful connection! Adding URL to file..."
        with open("/destination/file.txt", "a") as genuine
            print i + "\n"
    del r

我会建议更多这样的东西：

import requests
from requests.exceptions import Timeout, ConnectionError, MissingSchema

...
    for url in lines:
        try:
            resp = requests.get(url)
            if resp.ok:
                # now you can add the url to the file
        except (Timeout, ConnectionError, MissingSchema):
            pass

如何将（成功的）URL连接添加到文件

1 个答案: