Question

我正在尝试同时调用3个网址并记录任何错误。这是我的示例代码：

urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"];

try:
     results = pool.map(urllib2.urlopen, urls);
 except URLError:
     urllib2.urlopen("https://example.com/log_error/?url="+URLError.url);

我只想通过让他们调用/log_error/网址来了解哪些网址（如果有的话）错误。但是当我有这样的代码时，我收到一条错误，说URLError未定义。

我的代码顶部有这些导入：

import urllib2 
from multiprocessing.dummy import Pool as ThreadPool

这是我的整个错误响应（这是使用AWS Lambda，无论它的价值）

{
  "stackTrace": [
    [
      "/var/task/lambda_function.py",
      27,
      "lambda_handler",
      "except Error as e:"
    ]
  ],
  "errorType": "NameError",
  "errorMessage": "global name 'URLError' is not defined"
}

如何捕获错误的网址，以便我知道它们是什么？

更新

我明白了：urllib.error所属的URLError类就是：urllib，不是 urllib2。< / p>

本文档页面的顶部说明了：https://docs.python.org/2/library/urllib2.html

这是我实际获得的更详细的HTTPError对象： https://docs.python.org/2/library/urllib2.html#urllib2.HTTPError

虽然存在错误URL本身的问题但是...目前我无法确定哪个URL是错误的。

更新2

显然str(e.url)就是我所需要的。我没有找到任何关于此的文件;这对我来说只是一个幸运的猜测。

现在这是工作代码：

urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"];

try:
     results = pool.map(urllib2.urlopen, urls);
 except Exception as e:
     urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;

更新3

感谢@mfripp informing me about the dangers of pool.map我再次将此代码修改为：

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None

def lambda_handler(event, context):

    urls = [
        "https://example.com/gives200.php", 
        "https://example.com/alsogives200.php", 
        "https://example.com/gives500.php"
    ];

    results = pool.map(urllib2.urlopen, urls);

    return urls;

Answer 1

from multiprocessing import Process, Pool
import urllib2

# Asynchronous request
def async_reqest(url):
    try:
        request = urllib2.Request(url)
        response = urllib2.urlopen(request)
        print response.info()
    except:
        pass

pool = Pool()
pool.map(async_reqest, links)

Answer 2

我不确定异常对象是否会提供有关失败的URL的详细信息。如果没有，您需要使用urllib2.urlopen(url)和try将每次通话打包到catch。你可以这样做：

urls = [
    "https://example.com/gives200.php", 
    "https://example.com/alsogives200.php", 
    "https://example.com/gives500.php"
]

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None

results = pool.map(my_urlopen, urls)
# At this point, any failed requests will have None as their value

Answer 3

编辑请参阅上面的更新3 。 mfripp's answer需要与此合并才能完全完成。

我更新了原帖以解释，但这正是我需要的代码。 我找不到任何引导我e.url的文档，这只是我最后的一个幸运猜测。

urls = [
    "https://example.com/gives200.php", 
    "https://example.com/alsogives200.php", 
    "https://example.com/gives500.php"
];

try:
     results = pool.map(urllib2.urlopen, urls);
except Exception as e:
     urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;

Python：如何使用urllib2和pool.map知道哪个URL失败？

3 个答案: