我的任务是每天从本地下载站点下载McAfee病毒定义文件。文件名每天都在变化。下载网站的路径是“http://download.nai.com/products/licensed/superdat/english/intel/7612xdat.exe” 这个ID 7612每天都会因为不同的东西而改变,因此我无法对其进行硬编码。我必须找到一种方法来列出文件名,然后再提供它作为参数等。 在stackoverflow网站上,我找到了某人的脚本,如果有人可以建议如何处理更改文件名,这将对我有用。 这是我将要使用的脚本:
def download(url):
"""Copy the contents of a file from a given URL
to a local file.
"""
import urllib
webFile = urllib.urlopen(url)
localFile = open(url.split('/')[-1], 'w')
localFile.write(webFile.read())
webFile.close()
localFile.close()
if __name__ == '__main__':
import sys
if len(sys.argv) == 2:
try:
download(sys.argv[1])
except IOError:
print 'Filename not found.'
else:
import os
print 'usage: %s http://server.com/path/to/filename' % os.path.basename(sys.argv[0])
有人可以建议我吗? 提前致谢
答案 0 :(得分:0)
这是一个两步的过程。首先抓取索引页面寻找文件。第二,抓住最新版本并下载
import urllib
import lxml.html
import os
import shutil
# index page
pattern_files_url = "http://download.nai.com/products/licensed/superdat/english/intel"
# relative url references based here
pattern_files_base = '/'.join(pattern_files_url.split('/')[:-1])
# scrape the index page for latest file list
doc = lxml.html.parse(pattern_files_url)
pattern_files = [ref for ref in doc.xpath("//a/@href") if ref.endswith('xdat.exe')]
if pattern_files:
pattern_files.sort()
newest = pattern_files[-1]
local_name = newest.split('/')[-1]
# grab it if we don't already have it
if not os.path.exists(local_name):
url = pattern_files_base + '/' + newest
print("downloading %s to %s" % (url, local_name))
remote = urllib.urlopen(url)
print dir(remote)
with open(local_name, 'w') as local:
shutil.copyfileobj(remote, local, length=65536)