Python根据日期排序数组

时间:2016-03-09 11:44:31

标签: python sorting

如何根据与字符串

组合的最大日期对python列表进行排序
['', 'q//Attachments/Swoop_coverletter_311386_20120103.doc', 'q//Attachments/Swoop_RESUME_311386_20091012.doc', 'q//Attachments/Swoop_Resume_311386_20100901.doc', 'q//Attachments/Swoop_reSume_311386_20120103.doc', 'q//Attachments/Swoop_coverletter_311386_20100901.doc', 'q//Attachments/Swoop_coverletter_311386_20091012.doc']

上面是列表,预期结果是这个

['q//Attachments/Swoop_coverletter_311386_20120103.doc','q//Attachments/Swoop_reSume_311386_20120103.doc','q//Attachments/Swoop_Resume_311386_20100901.doc','q//Attachments/Swoop_coverletter_311386_20100901.doc','q//Attachments/Swoop_RESUME_311386_20091012.doc','q//Attachments/Swoop_coverletter_311386_20091012.doc','']

我编写了一个脚本,该脚本没有排序,但只在结尾处打印一个值

a = ['q//Attachments/Swoop_coverletter_311386_20120103.doc','q//Attachments/Swoop_reSume_311386_20120103.doc','q//Attachments/Swoop_Resume_311386_20100901.doc','q//Attachments/Swoop_coverletter_311386_20100901.doc','q//Attachments/Swoop_RESUME_311386_20091012.doc','q//Attachments/Swoop_coverletter_311386_20091012.doc','']
print max(a)

结果:

q//Attachments/Swoop_reSume_311386_20120103.doc

我如何获得这样的预期输出

预期产出:

['q//Attachments/Swoop_coverletter_311386_20120103.doc','q//Attachments/Swoop_reSume_311386_20120103.doc','q//Attachments/Swoop_Resume_311386_20100901.doc','q//Attachments/Swoop_coverletter_311386_20100901.doc','q//Attachments/Swoop_RESUME_311386_20091012.doc','q//Attachments/Swoop_coverletter_311386_20091012.doc','']

3 个答案:

答案 0 :(得分:3)

编写一个函数,用正则表达式从字符串中提取日期,并将其用作sorted的键:

import re

l = ['',
     'q//Attachments/Swoop_coverletter_311386_20120103.doc',
     'q//Attachments/Swoop_RESUME_311386_20091012.doc',
     'q//Attachments/Swoop_Resume_311386_20100901.doc',
     'q//Attachments/Swoop_reSume_311386_20120103.doc',
     'q//Attachments/Swoop_coverletter_311386_20100901.doc',
     'q//Attachments/Swoop_coverletter_311386_20091012.doc']

def get_date(line):
    pattern = '.*_(\d{8}).doc'
    m = re.match(pattern, line)
    if m:
        return int(m.group(1))
    else:
        return -1 # or do something else with lines that contain no date


print sorted(l, key=get_date, reverse=True)

打印:

['q//Attachments/Swoop_coverletter_311386_20120103.doc', 
 'q//Attachments/Swoop_reSume_311386_20120103.doc', 
 'q//Attachments/Swoop_Resume_311386_20100901.doc', 
 'q//Attachments/Swoop_coverletter_311386_20100901.doc', 
 'q//Attachments/Swoop_RESUME_311386_20091012.doc', 
 'q//Attachments/Swoop_coverletter_311386_20091012.doc', 
 '']

答案 1 :(得分:0)

我认为使用内置函数str.rpartition('_')https://docs.python.org/3/library/stdtypes.html#str.rpartition)可以更容易解决问题。

我假设您的所有文件当然具有相同的格式,在这种情况下,上述函数将始终返回<date>.doc。然后,您只需删除.doc

答案 2 :(得分:0)

您可以尝试使用备用的单行解决方案(种类)。您必须先删除空元素来清理列表。

given_list = filter(None, given_list)
sorted(given_list, key=lambda x: datetime.strptime(x.split(".")[0][-8:], "%Y%m%d"), reverse=True)

或者像在BioGeek's answer中一样简化它,而不是使用datetime只需转换为int并对其进行排序。

given_list = filter(None, given_list)
sorted(a, key=lambda x: int(x.split(".")[0][-8:]), reverse=True)

输出:

['q//Attachments/Swoop_coverletter_311386_20120103.doc', 
 'q//Attachments/Swoop_reSume_311386_20120103.doc',
 'q//Attachments/Swoop_Resume_311386_20100901.doc',
 'q//Attachments/Swoop_coverletter_311386_20100901.doc', 
 'q//Attachments/Swoop_RESUME_311386_20091012.doc',
 'q//Attachments/Swoop_coverletter_311386_20091012.doc']