根据日期python的文本计数

时间:2017-08-08 02:27:57

标签: python

我的文件内容为:

'2014-08-09':"a" 
'2014-08-09':"a" 
'2014-08-09':"b"
'2014-09-09':"b" 
'2014-06-09':"b" 

我需要按日期查找文本计数,以下是o / p

 2014-08-09-> a:2, b:1
 2014-09-09-> b:1
 2014-06-09-> b:1. 

以下是我的代码:

with open("file.txt") as file:
 my_list = file.readlines()
 result = {}
 for item in my_list:
     posix_time = item.split(':')[0]
     time_val = item.split(':')[1]
     date_ext = datetime.datetime.fromtimestamp(
        int(posix_time)
     ).strftime('%Y-%m-%d')
     if time_val not in result:
         result[time_val] = 0
     else:
         result[time_val] += 1 

3 个答案:

答案 0 :(得分:1)

这是一个简单的选择:

import datetime
from collections import defaultdict
In [30]: with open("dates.txt") as f:
    ...:     res = defaultdict(dict)
    ...:     for line in f.readlines():
    ...:         date, letter = line.rstrip().split(':')
    ...:         letter = letter.replace("\"", "")
    ...:         date = datetime.datetime.strptime(date, "'%Y-%m-%d'")
    ...:         if letter in res[date]:
    ...:             res[date][letter] += 1
    ...:         else:
    ...:             res[date][letter] = 1

In [31]: res
Out[31]: 
defaultdict(dict,
            {datetime.datetime(2014, 6, 9, 0, 0): {'b': 1},
             datetime.datetime(2014, 8, 9, 0, 0): {'a': 2, 'b': 1},
             datetime.datetime(2014, 9, 9, 0, 0): {'b': 1}})

假设您希望将密钥作为日期时间对象。否则你可以删除那部分。

或者在defaultdict中使用Counter而不是dict:

In [36]: with open("dates.txt") as f:
    ...:     res = defaultdict(Counter)
    ...:     for line in f.readlines():
    ...:         date, letter = line.rstrip().split(':')
    ...:         letter = letter.replace("\"", "")
    ...:         date = datetime.datetime.strptime(date, "'%Y-%m-%d'")
    ...:         res[date].update({letter: 1})
    ...:         
    ...:         

In [37]: res
Out[37]: 
defaultdict(collections.Counter,
            {datetime.datetime(2014, 6, 9, 0, 0): Counter({'b': 1}),
             datetime.datetime(2014, 8, 9, 0, 0): Counter({'a': 2, 'b': 1}),
             datetime.datetime(2014, 9, 9, 0, 0): Counter({'b': 1})})

或者如Alexander所述,您可以使用lambda来创建复合默认字典。

In [38]: with open("dates.txt") as f:
    ...:     res = defaultdict(lambda: defaultdict(int))
    ...:     for line in f.readlines():
    ...:         date, letter = line.rstrip().split(':')
    ...:         letter = letter.replace("\"", "")
    ...:         date = datetime.datetime.strptime(date, "'%Y-%m-%d'")
    ...:         res[date][letter] += 1      

In [39]: res
Out[39]: 
defaultdict(<function __main__.<lambda>>,
            {datetime.datetime(2014, 6, 9, 0, 0): defaultdict(int, {'b': 1}),
             datetime.datetime(2014, 8, 9, 0, 0): defaultdict(int,
                         {'a': 2, 'b': 1}),
             datetime.datetime(2014, 9, 9, 0, 0): defaultdict(int, {'b': 1})})

这是有效的,因为int()等于0,这是我以前从未意识到的,但它很有道理。

按日期排序,然后按字母数量排序:

In [64]: l = list(res.items())

In [65]: l
Out[65]: 
[(datetime.datetime(2014, 8, 9, 0, 0), defaultdict(int, {'a': 2, 'b': 1})),
 (datetime.datetime(2014, 9, 9, 0, 0), defaultdict(int, {'b': 1})),
 (datetime.datetime(2014, 6, 9, 0, 0), defaultdict(int, {'b': 1}))]

In [66]: l.sort(key=lambda x: (sum(x[1].values()), x[0]))

In [67]: l
Out[67]: 
[(datetime.datetime(2014, 6, 9, 0, 0), defaultdict(int, {'b': 1})),
 (datetime.datetime(2014, 9, 9, 0, 0), defaultdict(int, {'b': 1})),
 (datetime.datetime(2014, 8, 9, 0, 0), defaultdict(int, {'a': 2, 'b': 1}))]

答案 1 :(得分:0)

您可以迭代数据,并创建所需的结果。这使用ast.literal_eval将带有quotes的字符串转换为文字字符串:

In []:
from collections import defaultdict
import datetime as dt
import ast

with open(<file>) as f:
    data = [[ast.literal_eval(word) for word in line.split(':')] for line in f]

result = {}
for date, c in data:
    date = dt.datetime.strptime(date, '%Y-%m-%d')
    result.setdefault(date, defaultdict(int))[c] += 1
result

Out[]:
{datetime.datetime(2014, 6, 9, 0, 0): defaultdict(int, {'b': 1}),
 datetime.datetime(2014, 8, 9, 0, 0): defaultdict(int, {'a': 2, 'b': 1}),
 datetime.datetime(2014, 9, 9, 0, 0): defaultdict(int, {'b': 1})}​

答案 2 :(得分:0)

您可以将文件读入列表并使用日期作为键的字典然后迭代每个键的值来计算它们并打印它们,例如:

with open('file.txt', 'r') as f:
    data = [line.rstrip().split(':') for line in f]
    result = {}
    for sub in data:
        try:
            result[sub[0].replace("'", '')] += sub[1].replace('"', '')
        except KeyError:
            result[sub[0].replace("'", '')] = sub[1].replace('"', '')
    for k, v in result.iteritems():  # use result.items() for python 3
        out = ''
        out += '{}-> '.format(k)
        for c in set(v):
            out += '{}: {} '.format(c, v.count(c))
        print out

输出:

2014-08-09-> a: 2 b: 1 
2014-06-09-> b: 1 
2014-09-09-> b: 1