我正在抓取一个网站并将图像保存到本地。效果很好,但是有些图像具有不同的路径但名称相同,因此即使它们是不同的图像,它们也会在我的本地被覆盖。
如何保存所有图像而不覆盖以前的图像。我正在考虑为每个图像名称添加一个计数器前缀,但似乎无法弄清楚。
代码如下:
# Save images
for url in urls:
filename = re.search(r'([\w_-]+[.](jpg|gif|png))$', url)
filename = re.sub(r'\d{4,}\.', '.', filename.group(0))
with open(filename, 'wb') as f:
if 'http' not in url:
# sometimes an image source can be relative
# if it is provide the base url which also happens
# to be the site variable atm.
hostname = urlparse(site).hostname
scheme = urlparse(site).scheme
url = '{}://{}/{}'.format(scheme, hostname, url)
# for full resolution image the last four digits needs to be striped
url = re.sub(r'\d{4,}\.', '.', url)
print('Fetching image from {} to {}'.format(url, filename))
response = requests.get(url)
f.write(response.content)
答案 0 :(得分:1)
您可以将方法编写为:
import datetime
def timeStamped(fname, fmt='%Y-%m-%d-%H-%M-%S_{fname}'):
return datetime.datetime.now().strftime(fmt).format(fname=fname)
按如下所示打开文件:
with open(timeStamped(filename),'w') as f:
将数据写为:
f.write(response.content)
答案 1 :(得分:0)
为文件添加时间戳
import datetime
import os.path
if os.path.isfile(fname):
t = datetime.datetime.now()
fname += t.strftime("%m/%d/%Y")