来自字典:
{0: (u'Donald', u'PERSON'), 1: (u'John', u'PERSON'), 2: (u'Trump', u'PERSON'), 14: (u'Barack', u'PERSON'), 15: (u'Obama', u'PERSON'), 17: (u'Michelle', u'PERSON'), 18: (u'Obama', u'PERSON'), 30: (u'Donald', u'PERSON'), 31: (u'Jonh', u'PERSON'), 32: (u'Trump', u'PERSON')}
我想创建另一个字典如下:
{u'Donald John Trump': 2, u'Barack Obama':1, u'Michele Obama':1}
这里0,1,2和30,31,32键增加1并发生两次。 14,15 17,18每次发生一次。有没有办法创造这样的字典?
答案 0 :(得分:3)
我认为您需要解决的主要问题是通过对表示增加的int序列的键进行分组来识别人员,如您所描述的那样。
幸运的是,Python有a recipe。
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
dct = {
0: ('Donald', 'PERSON'),
1: ('John', 'PERSON'),
2: ('Trump', 'PERSON'),
14: ('Barack', 'PERSON'),
15: ('Obama', 'PERSON'),
17: ('Michelle', 'PERSON'),
18: ('Obama', 'PERSON'),
30: ('Donald', 'PERSON'),
31: ('John', 'PERSON'),
32: ('Trump', 'PERSON')
}
persons = defaultdict(int) # Used for conveniance
keys = sorted(dct.keys()) # So groupby() can recognize sequences
for k, g in groupby(enumerate(keys), lambda d: d[0] - d[1]):
ids = map(itemgetter(1), g) # [0, 1, 2], [14, 15], etc.
person = ' '.join(dct[i][0] for i in ids) # "Donald John Trump", "Barack Obama", etc
persons[person] += 1
print(persons)
# defaultdict(<class 'int'>,
# {'Barack Obama': 1,
# 'Donald John Trump': 2,
# 'Michelle Obama': 1})
答案 1 :(得分:2)
def add_name(d, consecutive_keys, result):
result_key = ' '.join(d[k][0] for k in consecutive_keys)
if result_key in result:
result[result_key] += 1
else:
result[result_key] = 1
d = {0: (u'Donald', u'PERSON'), 1: (u'John', u'PERSON'), 2: (u'Trump', u'PERSON'),
14: (u'Barack', u'PERSON'), 15: (u'Obama', u'PERSON'),
17: (u'Michelle', u'PERSON'), 18: (u'Obama', u'PERSON'),
30: (u'Donald', u'PERSON'), 31: (u'John', u'PERSON'), 32: (u'Trump', u'PERSON')}
sorted_keys = sorted(d.keys())
last_key = sorted_keys[0]
consecutive_keys = [last_key]
result = {}
for i in sorted_keys[1:]:
if i == last_key + 1:
consecutive_keys.append(i)
else:
add_name(d, consecutive_keys, result)
consecutive_keys = [i]
last_key = i
add_name(d, consecutive_keys, result)
print(result)
<强>输出强>
{'Donald John Trump': 2, 'Barack Obama': 1, 'Michelle Obama': 1}