如果磁盘(已下载文件的大小)与服务器上的大小(对于网址)不同,如何检索网址名称?
import os, glob, urllib
urls_file = open ('urls.txt','r')
urls = urls_file.read().splitlines()
urls_file.close()
for u in urls:
data = urllib.urlopen(u)
size_server = data.info()['Content-Length']
files_disk = glob.glob('*.jpg')
for f in files_disk:
size_disk = os.stat(f).st_size
在那之后,我不知道如何继续,请帮助。
答案 0 :(得分:2)
所以我假设你在这里提取图像,如果Content-Length
标题与磁盘上的文件大小不匹配,你想得到这些网址的列表。
试试这个:
url_size = {}
with open('urls.txt') as f:
for line in f:
url = line.strip()
if len(url):
try:
data = urllib.urlopen(url)
url_size[os.path.basename(urlparse(url).path)] = data.info()['Content-Length']
except:
print('Cannot fetch information for: {}'.format(url))
for fname in glob.glob('*.jpg'):
try:
disk_size = os.stat(fname).st_size
if url_size.get(fname) != disk_size:
print('{} does not match fetched size of {}'.format(fname, url_size.get(fname))
except:
print('Cannot fetch file size for {}'.format(fname))
不要忘记导入库。