Question

我想对这个Python脚本进行健全性检查。我的目标是输入一个网址列表并获取一个字节大小，给我一个指示，如果网址是好还是坏。

import urllib2
import shutil

urls = (LIST OF URLS)

def getUrl(urls):
    for url in urls:
        file_name = url.replace('https://','').replace('.','_').replace('/','_')
        try:
            response = urllib2.urlopen(url)
        except urllib2.HTTPError, e:
            print e.code
        except urllib2URLError, e:
            print e.args
        print urls, len(response.read())
        with open(file_name,'wb') as out_file:
            shutil.copyfileobj(response, out_file)
getUrl(urls)

我遇到的问题是我的输出如下：

（URL列表）22511
（URL列表）56472
（URL列表）8717
...

如何只使用字节大小显示一个网址？
有没有更好的方法来获得这些结果？

Answer 1

尝试

>>> uuid = uuid.UUID(str(contents))
>>> uuid = uuid.UUID(contents.decode('ascii'))
>>> uuid = uuid.UUID(contents.decode('utf8'))

而不是

print url, len(response.read())

您每次都在打印列表。只需打印当前项目。

有一些其他方法可以确定所描述的页面大小here和here我没有必要在此重复这些信息。

修改

也许您会考虑使用print urls, len(response.read())代替requests。

您可以轻松地从HEAD请求中提取urllib2并避免完整的GET。 e.g。

content-length

HEAD请求使用import requests h = requests.head('http://www.google.com') print h.headers['content-length']或urllib2详细here。

Answer 2

如何只使用字节大小显示一个网址？

显然：不要

print urls, ...

但是

print url, ...

有没有更好的方法来检索Python的网页大小？

2 个答案: