Question

我想使用s3cmd界面从s3下载文件。我正在使用命令：

s3cmd get s3://db-backups/db/production_dump_2013-09-12_12-00.sql.gz dump1.sql.g

该命令工作正常。接下来，我想自动执行下载文件的任务。目录中有多个名称相似的文件，只有时间戳不同，如：

production_dump_2013-09-12_09-00.sql.gz
production_dump_2013-09-12_12-00.sql.gz
production_dump_2013-09-12_15-00.sql.gz
production_dump_2013-09-12_18-00.sql.gz
production_dump_2013-09-12_21-00.sql.gz

如何下载最新文件？如果文件的名称已知，那么我可以使用：

cmd = 's3cmd get s3://voylladb-backups/db/production_dump_2013-09-12_12-00.sql.gz dump1.sql.gz'
args = shlex.split(cmd)
p=subprocess.Popen(args)
p.wait()

如何修改此（或使用其他方法）以获取具有最新时间戳的文件？

由于

Answer 1

您可以使用s3cmd ls s3://voylladb-backups/db/。

然后假设您返回一个列表，您可以反向排序并获取第一个项目。这可能不是最简洁的写法，但它应该有效：

import subprocess, re

# Use subprocess.check_output to get the output from the terminal command
lines = subprocess.check_output("s3cmd ls s3://voylladb-backups/db/".split(" ")).split("\n")

# the format is a bit weird so we want to isolate just the s3:// paths
# we'll use a regex search to find the s3:// pattern and any subsequent characters
file_re = re.compile("s3://.+")
files = []

# next we iterate over each line of output from s3cmd ls looking for the s3 paths
for line in lines:
    result = file_re.search(line)
    if result:
        # and add them to our list
        files.append(result.group(0))

# finally, reverse the list so the newest file is first, and grab the first item
files.sort(reverse=True)
print files[0] # production_dump_2013-09-12_21-00.sql.gz

如何解析文件，其名称未知

1 个答案: