我有一个输入文件,格式如下:
457526373620277249 17644162 Sat Apr 19 14:29:22 +0000 2014 0 nc nc U are expressing a wish not a fact ;) @Manicdj99 @ANTIVICTORIA @Nupe117 @cspanwj
457522541926842368 402127017 Sat Apr 19 14:14:09 +0000 2014 0 nc nc @dfwlibrarian You're a great one to call somebody else "educationally challenged!" I'd call that a name call. #YouLose #PJNET #TCOT #TGDNGO YouLose,PJNET,TCOT,TGDNGO
457519476511350786 65713724 Sat Apr 19 14:01:58 +0000 2014 0 nc nc @Manicdj99 @Nupe117 @cspanwj only some RW fringies are upset- & they're ALWAYS angry at something-also too fat 2 get out of lazyboys
我需要根据时间对数据进行排序。
我正在使用strptime
函数,但无法根据时间对整个数据进行排序。
import datetime
dt=[]
for line in f:
splits = line.split('\t')
dt.append(datetime.datetime.strptime(splits[2], "%a %b %d %H:%M:%S +0000 %Y"))
dt.sort()
答案 0 :(得分:1)
您想要生成一个行列表,然后才对整个列表进行排序;您只捕获时间戳,并在每次添加新时间戳时对该列表进行排序,忽略其余数据。
您可以使用csv
module:
import csv
from datetime import datetime
from operator import itemgetter
rows = []
with open(yourfile, 'rb') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
row[2] = datetime.strptime(row[2], "%a %b %d %H:%M:%S +0000 %Y")
rows.append(row)
rows.sort(key=itemgetter(2)) # sort by the datetime column
答案 1 :(得分:1)
假设您的data.txt
文件看起来像这样(我将其截断到右侧):
457526373620277249 17644162 Sat Apr 19 14:29:22 +0000 2014 0 457522541926842368 402127017 Sat Apr 19 14:14:09 +0000 2014 0 457519476511350786 65713724 Sat Apr 19 14:01:58 +0000 2014 0
我还假设它的TAB分隔在这里。
这将正确解析数据,将日期转换为正确的datetime
对象,然后使用sorted(iterable, key=)
对其进行排序:
示例:强>
from __future__ import print_function
from datetime import datetime
from operator import itemgetter
def map_to_datetime(xs, index, format="%a %b %d %H:%M:%S +0000 %Y"):
for x in xs:
x[index] = datetime.strptime(x[index], format)
data = [line.split("\t") for line in map(str.strip, open("data.txt", "r"))]
map_to_datetime(data, 2)
for entry in sorted(data, key=itemgetter(2)):
print(entry)
<强>输出:强>
$ python -i foo.py
['457519476511350786', '65713724', datetime.datetime(2014, 4, 19, 14, 1, 58), '0']
['457522541926842368', '402127017', datetime.datetime(2014, 4, 19, 14, 14, 9), '0']
['457526373620277249', '17644162', datetime.datetime(2014, 4, 19, 14, 29, 22), '0']
>>>