我有数百个文本文件需要根据用户名和日期进行解析。我试图将有用的数据放在这样的列表中的文本文件中:
[
['1234245@gmail.com', '34209809' '1434546354', '2016-07-18 00:20:58'],
['abcd@gmail.com', '234534345', '09402380',, '2016-07-18 00:20:03'],
['username@gmail.com', '345315531','1098098098', '2016-07-18 02:40:00'],
['abcd@gmail.com', '345431353', '231200023', '2016-07-18 15:45:49'],
['1234245@gmail.com', '23232424', '234809809', '2016-07-18 20:45:40']
]
但是,我想根据日期时间和用户名分组对它们进行排序,以便输出如下:
[
['1234245@gmail.com', '23232424', '234809809', '2016-07-18 20:45:40'],
['1234245@gmail.com', '34209809' '1434546354', '2016-07-18 00:20:58'],
['abcd@gmail.com', '345431353', '231200023', '2016-07-18 15:45:49'],
['abcd@gmail.com', '234534345', '09402380',, '2016-07-18 00:20:03'],
['username@gmail.com', '345315531','1098098098', '2016-07-18 02:40:00']
]
这是我的代码:
import glob
from operator import itemgetter
from itertools import groupby
def read_large_file(filename):
matrix=[]
global username
username=[]
for myfile in glob.glob(filename):
infile = open(myfile, "r")
for row in infile:
row=row.strip()
array=row.split(';')
username.append(array[9])
matrix.append(cdr(array[9],array[17],array[18],array[8]))
return matrix
class cdr(object):
def__init__(self,username,total_seconds_since_start,download_bytes,date_time):
self.username=username
self.total_seconds_since_start=total_seconds_since_start
self.download_bytes=download_bytes
self.date_time=date_time
def GroupByUsername(matrix):
new_matrix=[]
new_matrix=groupby(matrix, itemgetter(0))
return new_matrix
matrix=read_large_file('C:\Users\ceren\.spyder2/test/*')
matrix_new=GroupByUsername(matrix)
我尝试使用此链接中的解决方案:Sorting and Grouping Nested Lists in Python但是我遇到了这些错误:
'cdr' object does not support indexing
'cdr' object is not iterable
答案 0 :(得分:2)
您可以使用简单的Python内置排序。
sorted_list = sorted(data, key=lambda user_info: (user_info[0], user_info[3]))
lambda键告诉Python如何对列表进行排序(升序)。对于data
中的每个条目,user_info
将是4个属性的列表。因此,user_info[0]
将成为电子邮件,user_info[3]
将成为日期时间。