如果满足某些条件,如何从另一个字典创建字典

时间:2016-08-21 10:00:43

标签: python dictionary count

来自字典:

{0: (u'Donald', u'PERSON'), 1: (u'John', u'PERSON'), 2: (u'Trump', u'PERSON'), 14: (u'Barack', u'PERSON'), 15: (u'Obama', u'PERSON'), 17: (u'Michelle', u'PERSON'), 18: (u'Obama', u'PERSON'), 30: (u'Donald', u'PERSON'), 31: (u'Jonh', u'PERSON'), 32: (u'Trump', u'PERSON')}

我想创建另一个字典如下:

{u'Donald John Trump': 2, u'Barack Obama':1, u'Michele Obama':1}

这里0,1,2和30,31,32键增加1并发生两次。 14,15 17,18每次发生一次。有没有办法创造这样的字典?

2 个答案:

答案 0 :(得分:3)

我认为您需要解决的主要问题是通过对表示增加的int序列的键进行分组来识别人员,如您所描述的那样。

幸运的是,Python有a recipe

from itertools import groupby
from operator import itemgetter
from collections import defaultdict

dct = {
    0: ('Donald', 'PERSON'),
    1: ('John', 'PERSON'),
    2: ('Trump', 'PERSON'),
    14: ('Barack', 'PERSON'),
    15: ('Obama', 'PERSON'),
    17: ('Michelle', 'PERSON'),
    18: ('Obama', 'PERSON'),
    30: ('Donald', 'PERSON'),
    31: ('John', 'PERSON'),
    32: ('Trump', 'PERSON')
}

persons = defaultdict(int)  # Used for conveniance
keys = sorted(dct.keys())   # So groupby() can recognize sequences

for k, g in groupby(enumerate(keys), lambda d: d[0] - d[1]):
    ids = map(itemgetter(1), g)                # [0, 1, 2], [14, 15], etc.
    person = ' '.join(dct[i][0] for i in ids)  # "Donald John Trump", "Barack Obama", etc
    persons[person] += 1

print(persons)
# defaultdict(<class 'int'>,
#        {'Barack Obama': 1,
#         'Donald John Trump': 2,
#         'Michelle Obama': 1})

答案 1 :(得分:2)

def add_name(d, consecutive_keys, result):
    result_key = ' '.join(d[k][0] for k in consecutive_keys)
    if result_key in result:
        result[result_key] += 1
    else:
        result[result_key] = 1

d = {0: (u'Donald', u'PERSON'), 1: (u'John', u'PERSON'), 2: (u'Trump', u'PERSON'),
     14: (u'Barack', u'PERSON'), 15: (u'Obama', u'PERSON'),
     17: (u'Michelle', u'PERSON'), 18: (u'Obama', u'PERSON'),
     30: (u'Donald', u'PERSON'), 31: (u'John', u'PERSON'), 32: (u'Trump', u'PERSON')}

sorted_keys = sorted(d.keys())
last_key = sorted_keys[0]
consecutive_keys = [last_key]
result = {}
for i in sorted_keys[1:]:
    if i == last_key + 1:
        consecutive_keys.append(i)
    else:
        add_name(d, consecutive_keys, result)
        consecutive_keys = [i]        
    last_key = i
add_name(d, consecutive_keys, result)

print(result)

<强>输出

{'Donald John Trump': 2, 'Barack Obama': 1, 'Michelle Obama': 1}