Question

我有以下字符串列表：

files = ['hulu_delta_20150517.xml', 'hulu_delta_20150518.xml', 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml', 'hulu_delta_20150523.xml', 'hulu_full20150517.xml', 'hulu_full20150518.xml']

我想按字符串中的日期对其进行排序。我该怎么办？到目前为止，我有：

sorted(files, key=lambda x: re.search(r'\d{8}',s).group())

但这只是给我和原始列表一样。

Answer 1

确保您的变量名称正确，对于lambda表达式，它应该是x而不是s：

>>> sorted(files, key=lambda x: re.search(r'\d{8}',x).group())
['hulu_delta_20150517.xml', 'hulu_full20150517.xml', 
 'hulu_delta_20150518.xml', 'hulu_full20150518.xml', 
 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 
 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml',
 'hulu_delta_20150523.xml']

Answer 2

I bullet-proofed this a bit more than required...

This verifies that the 8-digit string is exactly 8 digits, since '001001010100'... probably wasn't intended as a date.

It then verifies it's a valid date. (Hat tip to anmol_uppal --- much easier than slicing up the string for datetime.date.)

Date strings are left as strings, since they will sort correctly. All non-dated strings are sorted in ASCII-betical order, and show up first in the output.

import re
import time

def sort_by_iso_date(strings):
    # Pre-sort in ASCII-betical order, then sort by ISO date string.
    # This makes the final order predictable without complicating the
    # key function.
    strings = sorted(strings)
    return sorted(strings, key=first_iso_date_string)


def first_iso_date_string(s):
    '''
    Returns the first string of _exactly_ 8 digits in the given
    string s, or '' if no 8-digit sequence was found.
    '''
    date_regex = r'''
      (?<!\d)          # Not preceded by a digit.
      (?P<date>\d{8})  # Match _exactly_ 8 digits.  Name the group 'date'.
      (?!\d)           # Not followed by a digit.
    '''
    pattern = re.compile(date_regex, re.X)
    match = re.search(pattern, s)
    no_date_found = ''
    if match is None:
        return no_date_found
    iso_date_string = match.group('date')
    if not is_valid_date(iso_date_string):
        return no_date_found
    return iso_date_string


def is_valid_date(yyyymmdd):
    try:
        __ = time.strptime(yyyymmdd, '%Y%m%d')
    except ValueError:
        return False
    return True

Answer 3

您还可以通过将字符串转换为struct_time对象然后对该对象进行排序来进行排序。

import time

files = ['hulu_delta_20150517.xml', 'hulu_delta_20150518.xml', 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml', 'hulu_delta_20150523.xml', 'hulu_full20150517.xml', 'hulu_full20150518.xml']

a = time.strptime("20150517", "%Y%m%d")

b = sorted(files, key = lambda x:time.strptime(x[-12:-4], "%Y%m%d"))

Answer 4

这不是优雅的，但尝试了一下。

import re
from operator import itemgetter
files = ['hulu_delta_20150517.xml', 'hulu_delta_20150518.xml', 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml', 'hulu_delta_20150523.xml', 'hulu_full20150517.xml', 'hulu_full20150518.xml']
num=[re.findall('\d+', i)[0] for i in files]
print [elem[0] for elem in sorted(list(zip(files,num)),key=itemgetter(1))]

输出

['hulu_delta_20150517.xml', 'hulu_full20150517.xml', 'hulu_delta_20150518.xml', 'hulu_full20150518.xml', 'hulu_delta_20150519.xml', 'hulu_delta_20150520.xml', 'hulu_delta_20150521.xml', 'hulu_delta_20150522.xml', 'hulu_delta_20150523.xml']

如何在字符串中按日期排序列表

4 个答案: