Question

我有一个已删除推文的文件（基于类项目）。此时文件中的行看起来像：

@soandso something something <a href="http://pic.twitter.com/aphoto</a><a href="a link" target="_blank">Permalink</a> 1:40 PM - 17 Feb 2016<br><br>
@soandso something something <a href="http://pic.twitter.com/aphoto</a><a href="a link" target="_blank">Permalink</a> 1:32 PM - 16 Feb 2016<br><br>

我试图按日期对文件中的行进行排序。这就是我到目前为止拼凑的东西。

import re
from datetime import datetime

when = re.compile(r".+</a>(.+)<br><br>")

with open('tweets.txt','r+') as outfile:
    sortme = outfile.read()

    for match in re.finditer(when, sortme):
        tweet = match.group(0)
        when = match.group(1)
        when = datetime.strptime(when, " %I:%M %p - %d %b %Y")
        print when

将打印出转换格式的行中的所有日期从2016年2月17日下午1:40到2016-02-17 13:40:00，我相信这是一个约会时间。在过去的几天里，我一直在搜索高低，以寻找关于我如何按日期时间对文件中的所有行进行排序的线索。谢谢你的帮助！

Answer 1

我在最近几天搜索了高低，以寻找关于我如何按日期时间对文件中的所有行进行排序的线索。

def get_time(line):
    match = re.search(r"</a>\s*(.+?)\s*<br><br>", line)
    if match:
        return datetime.strptime(match.group(1), "%I:%M %p - %d %b %Y")
    return datetime.min

lines.sort(key=get_time)

它假定在给定时间段内时间是单调的（例如，没有DST转换），否则您应该首先将输入时间转换为UTC（或POSIX时间戳）。

Answer 2

您似乎已经解决了正则表达式问题...所以要将您的日期时间转换为可测量的数量转换为秒，如下所示：

import time
time.mktime(when.timetuple())

然后进行排序，你可以在不同的路线上做很多事情。最简单的例子是：

import operator
s = [("ab",50),("cd",100),("ef",15)]
print sorted(s,key=operator.itemgetter(1))
## [('ef', 15), ('ab', 50), ('cd', 100)]

Python - 通过正则表达式日期匹配对文件中的行进行排序

2 个答案: