Question

我从斯坦福大学NER中提取了一个词典列表，并创建了如下列表：

myList = [
{'A':{},'B':['C','D'],
'names': {'PERSON': [u'John Butters', u'Bill', u'Hillary Clinton'],'LOCATION': [],
 'ORGANIZATION': [u'FactSet', u'Pfizer Inc. PFE']}},
{'A':{'Hello'},'B':['F','E'], 
'names': {'PERSON': [u'Tim Anderson', u'Hillary Clinton'], 'LOCATION': [ u'US'], 
'ORGANIZATION': [u'Goldman Sachs GS', u'ConocoPhillips COP', u'FactSet']}},
{'A':{'right'},'B':['M','N'],
'names': {'PERSON': [u'Mohammed bin Salman', u'Spano'], 'LOCATION': [u'Saudi Arabia',u'Red Sea'],
 'ORGANIZATION': [u'Aramco', u'FactSet', u'Goldman Sachs GS']}}
 ]

换句话说，我有一个如下列表：

myList = [{},{},{}]

每个字典都包含一个特定文档的详细信息。名字的关键＆＃39;是一本字典：

'names':{'PERSON':[], 'LOCATION':[], 'ORGANIZATION':[]}

我打算在＆＃39;名称＆＃39;的关键字下提取值的频率。 ----＆GT; ＆＃39;组织＆＃39;在整个文档中，然后计算每对名称在myList中一起出现的次数。任何帮助将不胜感激。输出应如下所示：

{u'FactSet': 3, u'Pfizer Inc. PFE':1, u'Goldman Sachs GS':2, u'ConocoPhillips COP':1, u'Aramco':1}

最后，我想计算上述名字的共同出现次数。输出可以是：

{[u'FactSet', u'Pfizer Inc. PFE']:1, 
[u'Goldman Sachs GS', u'ConocoPhillips COP']:1,
[u'Goldman Sachs GS', u'FactSet'] :2,
[u'Aramco', u'FactSet']:1, 
[u'Aramco', u'Goldman Sachs GS']:1 }

Answer 1

这是一个使用itertools.combination轻松从列表中获取所有对的解决方案：

from itertools import combinations

myList = [
{'A':{},'B':['C','D'],
'names': {'PERSON': [u'John Butters', u'Bill', u'Hillary Clinton'],'LOCATION': [],
 'ORGANIZATION': [u'FactSet', u'Pfizer Inc. PFE']}},
{'A':{'Hello'},'B':['F','E'],
'names': {'PERSON': [u'Tim Anderson', u'Hillary Clinton'], 'LOCATION': [ u'US'],
'ORGANIZATION': [u'Goldman Sachs GS', u'ConocoPhillips COP', u'FactSet']}},
{'A':{'right'},'B':['M','N'],
'names': {'PERSON': [u'Mohammed bin Salman', u'Spano'], 'LOCATION': [u'Saudi Arabia',u'Red Sea'],
 'ORGANIZATION': [u'Aramco', u'FactSet', u'Goldman Sachs GS']}}
 ]

orgs_by_group = [group['names']['ORGANIZATION'] for group in myList]

org_counts = {}
org_pair_counts = {}

for org_group in orgs_by_group:
    #Update counts of orgs
    for org in org_group:
        if org not in org_counts:
            org_counts[org] = 1
        else:
            org_counts[org] += 1

    #Update counts of org pairs
    for pair in combinations(org_group,2):
        k = '|'.join(sorted(pair)) #<-- key for org_pair_counts dict
        if k not in org_pair_counts:
            org_pair_counts[k] = 1
        else:
            org_pair_counts[k] += 1

print('Org counts:')
print org_counts
print('')
print('Org pair counts:')
print org_pair_counts

输出：

Org counts:
{u'Pfizer Inc. PFE': 1, u'ConocoPhillips COP': 1, u'Goldman Sachs GS': 2, u'FactSet': 3, u'Aramco': 1}

Org pair counts:
{u'Aramco|Goldman Sachs GS': 1, u'Aramco|FactSet': 1, u'FactSet|Goldman Sachs GS': 2, u'ConocoPhillips COP|FactSet': 1, u'ConocoPhillips COP|Goldman Sachs GS': 1, u'FactSet|Pfizer Inc. PFE': 1}

注意：您不能将列表作为字典中的键，因此巧合的示例输出不起作用，这就是我将它们作为字符串的原因

Python：在dictiopnaries列表中计算名称

1 个答案: