需要帮助合并两个词典,使用一个键来查看另一个词中的值。如果返回true,它会将自己的值附加到另一个字典中(更新它但不覆盖已存在的值)
代码(对不起第一个自定义脚本):
otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()
#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
lineArray = re.split('\s+',line)
otuid = lineArray[0]
clusterid = lineArray[3]
if otuid in otuid2clusteridlist:
otuid2clusteridlist[otuid].append(clusterid)
else:
otuid2clusteridlist[otuid] = list()
otuid2clusteridlist[otuid].append(clusterid)
#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
lineArray = re.split('\s+', line)
clusterid = lineArray[4]
denoiseid = lineArray[3]
if clusterid in clusterid2denoiseidlist:
clusterid2denoiseidlist[clusterid].append(denoiseid)
else:
clusterid2denoiseidlist[clusterid] = list()
clusterid2denoiseidlist[clusterid].append(denoiseid)
#print/return function for testing (will convert to write out later)
for key in finallist:
print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]
第一个块返回
OTU: 3 has 3 sequence(s) which = ['5PLAS.R2.h_35336', 'GG13_52054', 'GG13_798']
OTU: 5 has 1 sequence(s) which = ['DEX1.h_14175']
OTU: 4 has 1 sequence(s) which = ['PLAS.h_34150']
OTU: 7 has 1 sequence(s) which = ['DEX12.13.h_545']
OTU: 6 has 1 sequence(s) which = ['GG13_45705']
阻止两次返回
OTU: GG13_45705 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']
所以我们的目标是将第二块输出加入第一块。我希望它像这样添加
...
OTU: 6 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']
我尝试了dic.update
,但它只是将第二个块内容添加到第一个块中,因为第一个块中没有该键。
我认为我的问题更复杂,我需要第二个块来查看第一个块的值,并将值附加到该列表中。
我一直在尝试循环和.append(类似于已编写的代码),但我缺乏python的整体知识来解决这个问题。
想法?
添加,
数据的一些子集:
cluster_97.ucm(阻止一个人的文件):
5 376 * DEX1.h_14175 DEX1.h_14175
6 294 * GG13_45705 GG13_45705
0 447 98.7 DEX22.h_37221 DEX29.h_4583
1 367 98.9 DEX14.15.h_35477 DEX27.h_779
1 443 98.4 DEX27.h_3794 DEX27.h_779
0 478 97.9 DEX22.h_7519 DEX29.h_4583
denoise.ucm_test(第二块文件):
11 294 * GG13_45705 GG13_45705
11 278 99.6 GG13_6312 GG13_45705
11 285 99.6 GG13_32148 GG13_45705
11 275 99.6 GG13_35246 GG13_45705
我选择了这些子集,因为文件一中的第二行是两个将要更新的文件。
如果有人想试一试。
答案 0 :(得分:0)
更新以反映值的匹配...
我认为你的问题的解决方案可以在以下事实中找到:在Python中列出一个mutable,而在可变值中列出的变量只是引用。所以我们可以使用第二个字典将值映射到列表。
import re
otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()
known_clusters = dict()
#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
lineArray = re.split('\s+',line)
otuid = lineArray[0]
clusterid = lineArray[3]
if otuid in otuid2clusteridlist:
otuid2clusteridlist[otuid].append(clusterid)
else:
otuid2clusteridlist[otuid] = list()
otuid2clusteridlist[otuid].append(clusterid)
# remeber the clusters
known_clusters[clusterid] = otuid2clusteridlist[otuid]
#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
lineArray = re.split('\s+', line)
clusterid = lineArray[4]
denoiseid = lineArray[3]
if clusterid in clusterid2denoiseidlist:
clusterid2denoiseidlist[clusterid].append(denoiseid)
else:
clusterid2denoiseidlist[clusterid] = list()
clusterid2denoiseidlist[clusterid].append(denoiseid)
# match the cluster and update as needed
matched_cluster = known_clusters.setdefault(clusterid, [])
if denoiseid not in matched_cluster:
matched_cluster.append(denoiseid)
#print/return function for testing (will convert to write out later)
for key in finallist:
print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]
我不确定您是否需要clusterid2denoiseidlist
,因此我添加了一个新的known_clusters
来保存从值到列表的映射。
我不确定我是否覆盖了实际问题中的所有边缘情况,但是根据提供的测试输入,这会生成所需的输出。