python将两个字典合并到嵌套字典中(文本相似性)

时间:2018-12-09 11:16:56

标签: python dictionary nested

我有以下文件:

documents = ["Human machine interface for lab abc computer applications",
              "A survey of user opinion of computer system response time",
              "The EPS user interface management system",
              "System and human system engineering testing of EPS",
              "Relation of user perceived response time to error measurement",
              "The generation of random binary unordered trees",
              "The intersection graph of paths in trees",
              "Graph minors IV Widths of trees and well quasi ordering",
             "Graph minors A survey"]

从中建立一个词矩阵:

wordmatrix = []
wordmatrix = [sentences.split(" ") for sentences in documents]

输出:

  

[[“人”,“机器”,“界面”,“用于”,“实验室”,“ abc”,“计算机”,   '应用程序'],['A','调查','of','用户','意见','of',   '计算机','系统','响应','时间'],['The','EPS','用户',   '界面','管理','系统'],['系统','和','人',   '系统','工程','测试','of','EPS'],['关系','of',   “用户”,“感知”,“响应”,“时间”,“到”,“错误”,   '测量'],['The','Generation','of','random','binary',   '无序','树木'],['The','intersection','graph','of','paths',   'in','trees'],['Graph','minor','IV','Widths','of','trees',   “和”,“很好”,“准”,“排序”],[“图形”,“未成年人”,“ A”,   '调查']]

接下来,我要创建一个词典,为每个文档创建一个键,并将单词作为键,并将单词在文档中出现的频率作为数值。

但我只能走到这里:

初始化字典

dic1 = {}
dic2 = {}
d = {}

第一个词典为每个文档提供一个键:

dic1 = dict(enumerate(sentence for sentence in wordmatrix))

输出:

  

{0:[“人类”,“机器”,“界面”,“用于”,“实验室”,“ abc”,“计算机”,   “应用程序”],1:[[A],“调查”,“ of”,“用户”,“意见”,“ of”,   '计算机','系统','响应','时间'],2:['The','EPS','用户',   'interface','management','system'],3:['System','and','human',   'system','engineering','testing','of','EPS'],4:['Relation',   “ of”,“ user”,“ percepted”,“ response”,“ time”,“ to”,“ error”,   [measurement]],5:['The','generation','of','random','binary',   '无序','树'],6:['The','intersection','graph','of',   'paths','in','trees'],7:['Graph','minor','IV','Widths','of',   'trees','and','well','准','ordering'],8:['Graph','minor',   'A','survey']}

第二个字典,使每个单词成为一个键:

for sentence in wordmatrix:
    for word in sentence:
        dic2[word] = dic2.get(word, 0) + 1

输出:

  

{'Human':1,'machine':1,'interface':2,'for':1,'lab':1,'abc':   1,'计算机':2,'应用程序':1,'A':2,'调查':2,'of':7,   '用户':3,'观点':1,'系统':3,'响应':2,'时间':2,'The':   3,“ EPS”:2,“管理”:1,“系统”:1,“与”:2,“人”:1,   “工程”:1,“测试”:1,“关系”:1,“感知”:1,“至”:   1,'错误':1,'测量':1,'代':1,'随机':1,   'binary':1,'unordered':1,'trees':3,'intersection':1,'graph':   1,'路径':1,'在':1,'图表':2,'未成年人':2,'IV':1,'宽度':1,   'well':1,'拟':1,'ordering':1}

但是,我想将两个字典合并到一个字典中,该字典应如下所示: {0:{'Human':1,'machine':1,'interface':2,....},1 :(依此类推)}

谢谢!

1 个答案:

答案 0 :(得分:0)

您不必合并两个字典,仅当您拥有version: '3' services: db: image: arangodb/arangodb:3.4.0 ports: - "8529:8529" environment: ARANGO_NO_AUTH: 1 volumes: - ./arangodb3:/var/lib/arangodb3 时,才可以使用dic2构建新字典。

dic2