如何根据与字符串
组合的最大日期对python列表进行排序['', 'q//Attachments/Swoop_coverletter_311386_20120103.doc', 'q//Attachments/Swoop_RESUME_311386_20091012.doc', 'q//Attachments/Swoop_Resume_311386_20100901.doc', 'q//Attachments/Swoop_reSume_311386_20120103.doc', 'q//Attachments/Swoop_coverletter_311386_20100901.doc', 'q//Attachments/Swoop_coverletter_311386_20091012.doc']
上面是列表,预期结果是这个
['q//Attachments/Swoop_coverletter_311386_20120103.doc','q//Attachments/Swoop_reSume_311386_20120103.doc','q//Attachments/Swoop_Resume_311386_20100901.doc','q//Attachments/Swoop_coverletter_311386_20100901.doc','q//Attachments/Swoop_RESUME_311386_20091012.doc','q//Attachments/Swoop_coverletter_311386_20091012.doc','']
我编写了一个脚本,该脚本没有排序,但只在结尾处打印一个值
a = ['q//Attachments/Swoop_coverletter_311386_20120103.doc','q//Attachments/Swoop_reSume_311386_20120103.doc','q//Attachments/Swoop_Resume_311386_20100901.doc','q//Attachments/Swoop_coverletter_311386_20100901.doc','q//Attachments/Swoop_RESUME_311386_20091012.doc','q//Attachments/Swoop_coverletter_311386_20091012.doc','']
print max(a)
结果:
q//Attachments/Swoop_reSume_311386_20120103.doc
我如何获得这样的预期输出
预期产出:
['q//Attachments/Swoop_coverletter_311386_20120103.doc','q//Attachments/Swoop_reSume_311386_20120103.doc','q//Attachments/Swoop_Resume_311386_20100901.doc','q//Attachments/Swoop_coverletter_311386_20100901.doc','q//Attachments/Swoop_RESUME_311386_20091012.doc','q//Attachments/Swoop_coverletter_311386_20091012.doc','']
答案 0 :(得分:3)
编写一个函数,用正则表达式从字符串中提取日期,并将其用作sorted
的键:
import re
l = ['',
'q//Attachments/Swoop_coverletter_311386_20120103.doc',
'q//Attachments/Swoop_RESUME_311386_20091012.doc',
'q//Attachments/Swoop_Resume_311386_20100901.doc',
'q//Attachments/Swoop_reSume_311386_20120103.doc',
'q//Attachments/Swoop_coverletter_311386_20100901.doc',
'q//Attachments/Swoop_coverletter_311386_20091012.doc']
def get_date(line):
pattern = '.*_(\d{8}).doc'
m = re.match(pattern, line)
if m:
return int(m.group(1))
else:
return -1 # or do something else with lines that contain no date
print sorted(l, key=get_date, reverse=True)
打印:
['q//Attachments/Swoop_coverletter_311386_20120103.doc',
'q//Attachments/Swoop_reSume_311386_20120103.doc',
'q//Attachments/Swoop_Resume_311386_20100901.doc',
'q//Attachments/Swoop_coverletter_311386_20100901.doc',
'q//Attachments/Swoop_RESUME_311386_20091012.doc',
'q//Attachments/Swoop_coverletter_311386_20091012.doc',
'']
答案 1 :(得分:0)
我认为使用内置函数str.rpartition('_')
(https://docs.python.org/3/library/stdtypes.html#str.rpartition)可以更容易解决问题。
我假设您的所有文件当然具有相同的格式,在这种情况下,上述函数将始终返回<date>.doc
。然后,您只需删除.doc
。
答案 2 :(得分:0)
您可以尝试使用备用的单行解决方案(种类)。您必须先删除空元素来清理列表。
given_list = filter(None, given_list)
sorted(given_list, key=lambda x: datetime.strptime(x.split(".")[0][-8:], "%Y%m%d"), reverse=True)
或者像在BioGeek's answer中一样简化它,而不是使用datetime
只需转换为int
并对其进行排序。
given_list = filter(None, given_list)
sorted(a, key=lambda x: int(x.split(".")[0][-8:]), reverse=True)
输出:
['q//Attachments/Swoop_coverletter_311386_20120103.doc',
'q//Attachments/Swoop_reSume_311386_20120103.doc',
'q//Attachments/Swoop_Resume_311386_20100901.doc',
'q//Attachments/Swoop_coverletter_311386_20100901.doc',
'q//Attachments/Swoop_RESUME_311386_20091012.doc',
'q//Attachments/Swoop_coverletter_311386_20091012.doc']