从同一个文件(python)读入两个词典

时间:2017-03-25 13:23:54

标签: python

我是python的新手,我正在尝试将文本文件读入两个字典,其值为列表。

该文件包含以下内容:

term1  doc1 doc3 doc4
term2  doc5 doc1
term3  doc6 doc2

我正在尝试从同一个文件创建两个词典,一个将术语作为键和值作为文档,另一个将是相反的。

inverted_index = {}
forward_index = {}
with open('term_sample.txt') as file:
    for line in file:
        items = line.split()
        term, doc = items[0], items[1:]
        for doc in items[1:]
            inverted_index[term] = [doc]
            forward_index[doc] = [term]

print(inverted_index)
print(forward_index)

到目前为止,我已经完成了以下输出:

{'term2': ['doc1'], 'term1': ['doc4'], 'term3': ['doc2']}
{'doc3': ['term1'], 'doc6': ['term3'], 'doc4': ['term1'], 'doc5': ['term2'], 'doc1': ['term2'], 'doc2': ['term3']}

但这是我正在寻找的输出:

{'term1': ['doc1','doc3','doc4'], 'term2': ['doc5','doc1'], 'term3': ['doc6','doc2']}
{'doc1': ['term1','term2'], 'doc3': ['term1'], 'doc4': ['term1'], 'doc5': ['term2'], 'doc6': ['term3'], 'doc2': ['term3']}

请帮我解决这个问题!

4 个答案:

答案 0 :(得分:3)

您不需要在内部循环中添加inverted_index,只需为每一行添加一次。

在内部循环中,如果字典条目已经存在,则需要附加到字典条目,而不是覆盖它。

inverted_index = {}
forward_index = {}
with open('term_sample.txt') as file:
    for line in file:
        items = line.split()
        term, doc = items[0], items[1:]
        inverted_index[term] = doc
        for doc in items[1:]
            forward_index.setdefault(doc, []).append(term)

print(inverted_index)
print(forward_index)

答案 1 :(得分:1)

您可以使用defaultdict(list)模块中的collections - 每次密钥更新时都会在您的解决方案中使用:

#!/usr/bin/env python 

from collections import defaultdict

inverted_index = defaultdict(list)
forward_index = defaultdict(list)
with open('term_sample.txt') as file:
    for line in file:
        items = line.split()
        term, doc = items[0], items[1:]
        for doc in items[1:]:
            inverted_index[term].append(doc)
            forward_index[doc].append(term)

print(inverted_index)
print(forward_index)

答案 2 :(得分:1)

inverted_index不应该在内部for中,而对于forward_index,您替换了每个内部for中的先前值。请尝试以下代码:

inverted_index = {}
forward_index = {}
with open('test') as f:
    for line in f:
        items = line.split()
        term, docs = items[0], items[1:]
        inverted_index[term] = docs
        for doc in docs:
            terms = forward_index.get(doc, [])
            terms.append(term)
            forward_index[doc] = terms

print(inverted_index)
print(forward_index)

答案 3 :(得分:1)

正如'编码员'建议的那样,我也会在这里使用defaultdict。由于doc可能会在多个term中出现多次,因此您应使用set来避免重复项:

from collections import defaultdict

inverted_index = defaultdict(set)
forward_index = defaultdict(list)
with open('term_sample.txt') as file:
    for line in file:
        items = line.split()
        term, docs = items[0], items[1:]
        inverted_index[term].update(docs)
        for doc in docs:
            forward_index[doc].append(term)

print(inverted_index)
print(forward_index)

(正如Barmar建议的那样,你只需要在外循环中分配forward_index一次。)