Python:如何使用urllib2和pool.map知道哪个URL失败?

时间:2017-04-26 19:52:42

标签: python aws-lambda

我正在尝试同时调用3个网址并记录任何错误。这是我的示例代码:

urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"];

try:
     results = pool.map(urllib2.urlopen, urls);
 except URLError:
     urllib2.urlopen("https://example.com/log_error/?url="+URLError.url);

我只想通过让他们调用/log_error/网址来了解哪些网址(如果有的话)错误。但是当我有这样的代码时,我收到一条错误,说URLError未定义。

我的代码顶部有这些导入:

import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

这是我的整个错误响应(这是使用AWS Lambda,无论它的价值)

{
  "stackTrace": [
    [
      "/var/task/lambda_function.py",
      27,
      "lambda_handler",
      "except Error as e:"
    ]
  ],
  "errorType": "NameError",
  "errorMessage": "global name 'URLError' is not defined"
}

如何捕获错误的网址,以便我知道它们是什么?

更新

我明白了:urllib.error所属的URLError类就是:urllib不是 urllib2。< / p>

本文档页面的顶部说明了:https://docs.python.org/2/library/urllib2.html

这是我实际获得的更详细的HTTPError对象: https://docs.python.org/2/library/urllib2.html#urllib2.HTTPError

虽然存在错误URL本身的问题但是...目前我无法确定哪个URL是错误的。

更新2

显然str(e.url)就是我所需要的。我没有找到任何关于此的文件;这对我来说只是一个幸运的猜测。

现在这是工作代码:

urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"];

try:
     results = pool.map(urllib2.urlopen, urls);
 except Exception as e:
     urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;

更新3

感谢@mfripp informing me about the dangers of pool.map我再次将此代码修改为:

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None

def lambda_handler(event, context):

    urls = [
        "https://example.com/gives200.php", 
        "https://example.com/alsogives200.php", 
        "https://example.com/gives500.php"
    ];

    results = pool.map(urllib2.urlopen, urls);

    return urls;

3 个答案:

答案 0 :(得分:1)

from multiprocessing import Process, Pool
import urllib2

# Asynchronous request
def async_reqest(url):
    try:
        request = urllib2.Request(url)
        response = urllib2.urlopen(request)
        print response.info()
    except:
        pass

pool = Pool()
pool.map(async_reqest, links)

答案 1 :(得分:1)

我不确定异常对象是否会提供有关失败的URL的详细信息。如果没有,您需要使用urllib2.urlopen(url)try将每次通话打包到catch。你可以这样做:

urls = [
    "https://example.com/gives200.php", 
    "https://example.com/alsogives200.php", 
    "https://example.com/gives500.php"
]

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None

results = pool.map(my_urlopen, urls)
# At this point, any failed requests will have None as their value

答案 2 :(得分:1)

编辑请参阅上面的更新3 mfripp's answer需要与此合并才能完全完成。

我更新了原帖以解释,但这正是我需要的代码。 我找不到任何引导我e.url的文档,这只是我最后的一个幸运猜测。

urls = [
    "https://example.com/gives200.php", 
    "https://example.com/alsogives200.php", 
    "https://example.com/gives500.php"
];

try:
     results = pool.map(urllib2.urlopen, urls);
except Exception as e:
     urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;