我在文件夹“supertar”中有各种tar文件,标记为: -
esarchive--Mona-AB-Test226-8037affd-06d1-4c61-a91f-816ec9cb825f-05222017-4.tar,
esarchive--Jackson-HQ-112-ecb5ab6a-c199-402d-9a8a-8c54c8901d66-06092017-4.tar,
esarchive--Mona-AB-Test226-8037affd-06d1-4c61-a91f-816ec9cb825f-05202017-4.tar,
esarchive--Jackson-HQ-112-ecb5ab6a-c199-402d-9a8a-8c54c8901d66-06012017-4.tar,
esarchive--Jonah-7fbbbc6c-8463-4ec1-9bde-3fc5429311e5-06092017-4
如何为每个客户提取最新的.tar文件名,例如Mona,Jackson,Jonah在各自的文件名中根据其日期值(文件名末尾)提及,以便我获得一个变量值:
esarchive--Mona-AB-Test226-8037affd-06d1-4c61-a91f-816ec9cb825f-05222017-4.tar,
esarchive--Jackson-HQ-112-ecb5ab6a-c199-402d-9a8a-8c54c8901d66-06092017-4.tar,
esarchive--Jonah-7fbbbc6c-8463-4ec1-9bde-3fc5429311e5-06092017-4
到目前为止,我已经执行了以下代码: -
for file in glob.glob("*.tar"):
# print "The File Being Untarred is:",file
file_date_str = file.split('-')[-2]
datetime_obj = datetime.datetime.strptime(file_date_str, '%m%d%Y')
a=re.match("esarchive--(\w+)-(\w+)-(\w+)", file).group(1)# Gets Mona from file name
b=re.match("esarchive--(\w+)-(\w+)-(\w+)", file).group(2)# Gets AB from file name
c=re.match("esarchive--(\w+)-(\w+)-(\w+)", file).group(3)# Gets Test226 from file name
s = a+'-'+b+'-'+c
d=s.lower()
my_dict={}
date = datetime.date.today()
print(s)
try:
(latest_date, _) = my_dict['name'] # _ has file name, which you don't want to compare.
if date > latest_date:
# If entry for this name exists,
# Replace the info with latest date.
my_dict['name'] = (date,file)
except KeyError:
# No info for this name in dictionary.
my_dict['name'] = (date,file)
print "The File Being Untarred is:",my_dict['name']
tar = tarfile.open("/home/chetan/Desktop/supertar/"+my_dict['name'][1])
tar.extractall(path="/home/chetan/Documents/chetan-dump-es") # untar file into same directory
tar.close()
我得到的是所有文件的列表,而不是最新的文件。
答案 0 :(得分:1)
我认为您的数据看起来很熟悉......我们是否已经在last question上覆盖了这些数据?通过微小的调整,以前的答案可以适应这种情况 - 你不需要所有文件的列表,只需要每个客户的最高价值,你可以提取一个& #39;客户'因为你可以提取与日期相同的方式(你根本不需要完全解析,如我之前的答案所示)。
类似的东西:
def parse_date(name, offset=-10): # lets re-use our convenience function
try:
date_str = name[offset:offset+8]
return int(date_str[-4:] + date_str[:2] + date_str[2:4])
except (IndexError, TypeError, ValueError): # invalid file name
return -1
result = {} # use this as our result / lookup table
for file_name in glob.glob("*.tar"):
# for customer name, skip `esarchive--` and pick everything until the next dash
customer = file_name[11:file_name.find("-", 11)]
date = parse_date(file_name, -14)
# now replace our stored value if it's older than the date in our current file name
if result.get(customer, [-1])[0] < date:
result[customer] = [date, file_name] # store the parsed date and file name
然后你可以使用它(假设你发布的数据):
for k, v in result.items():
print("Customer: {}\n\tDate: {}\n\tFile: {}".format(k, v[0], v[1]))
# prints:
# Customer: Jonah
# Date: 20170609
# File: esarchive--Jonah-7fbbbc6c-8463-4ec1-9bde-3fc5429311e5-06092017-4.tar
# Customer: Jackson
# Date: 20170609
# File: esarchive--Jackson-HQ-112-ecb5ab6a-c199-402d-9a8a-8c54c8901d66-06092017-4.tar
# Customer: Mona
# Date: 20170522
# File: esarchive--Mona-AB-Test226-8037affd-06d1-4c61-a91f-816ec9cb825f-05222017-4.tar
# or if you just want the list of file names:
file_names = [entry[1] for entry in result.values()]
# ['esarchive--Jonah-7fbbbc6c-8463-4ec1-9bde-3fc5429311e5-06092017-4.tar',
# 'esarchive--Jackson-HQ-112-ecb5ab6a-c199-402d-9a8a-8c54c8901d66-06092017-4.tar',
# 'esarchive--Mona-AB-Test226-8037affd-06d1-4c61-a91f-816ec9cb825f-05222017-4.tar']