Question

我正在将文件中的数据读取到一系列列表中，如下所示：

sourceData = [[source, topic, score],[source, topic, score],[source, topic, score]...]

其中每个列表中的来源和主题可以相同或不同。

我想要实现的是一本字典，该字典将与每个来源相关联的主题及其相关分数分组（然后将分数平均，但是出于这个问题的目的，我们仅将它们列为主题的值（键）。

理想情况下，结果看起来像是嵌套字典的列表，如下所示：

[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]

我认为最好的方法是创建一个源计数器，然后为每个源的每个主题创建一个字典，然后将每个字典另存为每个相应源的值。但是，我很难正确地迭代以获得所需的结果。

这是我到目前为止所拥有的：

sourceDict = {} 
sourceDictList = []

for row in sourceData:
    source = row[0]
    score = row[1]
    topic = row[2]
    sourceDict = [source,{topic:score}]
    sourceDictList.append(sourceDict)
    sourceList.append(source)

其中sourceDictList产生以下结果：[[source, {topic: score}]...],（实质上是重新格式化原始列表中的数据），而sourceList只是所有源的列表（有些重复）

然后我初始化一个计数器，并将计数器的源与sourceDictList的源进行匹配，如果匹配，请将topic:score字典保存为键：

sourceCounter = Counter(sourceList)


for key,val in sourceCounter.items():
    for dictitem in sourceDictList:
        if dictitem[0] == key:
            sourceCounter[key] = dictitem[1]

但是输出仅将最后一个topic:score字典保存到每个源。所以不是想要的：

[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]

我只会得到：

Counter({SOURCE1: {TOPIC_n: 'SCORE_n'}, SOURCE2: {TOPIC_n: 'SCORE_n'}, SOURCE3: {TOPIC_n: 'SCORE_n'}})

我的印象是，如果有一个唯一键保存到字典中，它将在不覆盖以前的key:value对的情况下附加该对$value。我想念什么吗？

对此表示感谢。

Answer 1

我们可以做到：

sourceData = [
    ['source1', 'topic1', 'score1'],
    ['source1', 'topic2', 'score1'],
    ['source1', 'topic1', 'score2'],

    ['source2', 'topic1', 'score1'],
    ['source2', 'topic2', 'score2'],
    ['source2', 'topic1', 'score3'],
]

sourceDict = {}

for row in sourceData:
    source = row[0]
    topic = row[1]
    score = row[2]

    if source not in sourceDict:
        # This will be executed when the source
        # comes for the first time.
        sourceDict[source] = {}

    if topic not in sourceDict[source]:
        # This will be executed when the topic
        # inside that source comes for the first time.
        sourceDict[source][topic] = []

    sourceDict[source][topic].append(score)

print(sourceDict)

Answer 2

您可以简单地使用集合的defaultdict

sourdata = [['source', 'topic', 2],['source', 'topic', 3], ['source', 'topic2', 3],['source2', 'topic', 4]]

from collections import defaultdict

sourceDict = defaultdict(dict)


for source, topic, score in sourdata:
    topicScoreDict = sourceDict[source]
    topicScoreDict[topic] = topicScoreDict.get(topic, []) + [score]

>>> print(sourceDict)
>>> defaultdict(<class 'dict'>, {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}})
>>> print(dict(sourceDict))
>>> {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}}

如何迭代嵌套字典（计数器）并递归更新密钥

2 个答案: