我有以下字符串列表:
files = ['hulu_delta_20150517.xml', 'hulu_delta_20150518.xml', 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml', 'hulu_delta_20150523.xml', 'hulu_full20150517.xml', 'hulu_full20150518.xml']
我想按字符串中的日期对其进行排序。我该怎么办?到目前为止,我有:
sorted(files, key=lambda x: re.search(r'\d{8}',s).group())
但这只是给我和原始列表一样。
答案 0 :(得分:1)
确保您的变量名称正确,对于lambda表达式,它应该是x
而不是s
:
>>> sorted(files, key=lambda x: re.search(r'\d{8}',x).group())
['hulu_delta_20150517.xml', 'hulu_full20150517.xml',
'hulu_delta_20150518.xml', 'hulu_full20150518.xml',
'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml',
'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml',
'hulu_delta_20150523.xml']
答案 1 :(得分:1)
I bullet-proofed this a bit more than required...
This verifies that the 8-digit string is exactly 8 digits, since '001001010100'...
probably wasn't intended as a date.
It then verifies it's a valid date. (Hat tip to anmol_uppal --- much easier than slicing up the string for datetime.date
.)
Date strings are left as strings, since they will sort correctly. All non-dated strings are sorted in ASCII-betical order, and show up first in the output.
import re
import time
def sort_by_iso_date(strings):
# Pre-sort in ASCII-betical order, then sort by ISO date string.
# This makes the final order predictable without complicating the
# key function.
strings = sorted(strings)
return sorted(strings, key=first_iso_date_string)
def first_iso_date_string(s):
'''
Returns the first string of _exactly_ 8 digits in the given
string s, or '' if no 8-digit sequence was found.
'''
date_regex = r'''
(?<!\d) # Not preceded by a digit.
(?P<date>\d{8}) # Match _exactly_ 8 digits. Name the group 'date'.
(?!\d) # Not followed by a digit.
'''
pattern = re.compile(date_regex, re.X)
match = re.search(pattern, s)
no_date_found = ''
if match is None:
return no_date_found
iso_date_string = match.group('date')
if not is_valid_date(iso_date_string):
return no_date_found
return iso_date_string
def is_valid_date(yyyymmdd):
try:
__ = time.strptime(yyyymmdd, '%Y%m%d')
except ValueError:
return False
return True
答案 2 :(得分:0)
您还可以通过将字符串转换为struct_time
对象然后对该对象进行排序来进行排序。
import time
files = ['hulu_delta_20150517.xml', 'hulu_delta_20150518.xml', 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml', 'hulu_delta_20150523.xml', 'hulu_full20150517.xml', 'hulu_full20150518.xml']
a = time.strptime("20150517", "%Y%m%d")
b = sorted(files, key = lambda x:time.strptime(x[-12:-4], "%Y%m%d"))
答案 3 :(得分:0)
这不是优雅的,但尝试了一下。
import re
from operator import itemgetter
files = ['hulu_delta_20150517.xml', 'hulu_delta_20150518.xml', 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml', 'hulu_delta_20150523.xml', 'hulu_full20150517.xml', 'hulu_full20150518.xml']
num=[re.findall('\d+', i)[0] for i in files]
print [elem[0] for elem in sorted(list(zip(files,num)),key=itemgetter(1))]
输出
['hulu_delta_20150517.xml', 'hulu_full20150517.xml', 'hulu_delta_20150518.xml', 'hulu_full20150518.xml', 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml', 'hulu_delta_20150523.xml']