我有一个循环遍历多个文件的脚本。 对于每个文件,我计算文件中特定组合的出现频率。
我使用以下代码执行此操作:
with open("%s" %files) as f:
freqs = {}
sortedFreqs = []
# read lines of csv file
for l in f.readlines():
# some code here (not added) which fills the mutationList value
# this dict stores how often which mutation occurs.
freqs = Counter(mutationList)
# same list, only sorted.
sortedFreqs = sorted(freqs.iteritems(), key=operator.itemgetter(1), reverse=True)
所以freqs变量包含很长的条目列表。
示例:
'FAM123Ap.Y550': 1, 'SMARCB1p.D192': 1, 'CSMD3p.T1137': 3
我现在想根据第二个值对它们进行排序,这些值存储在sortedFreqs中。
示例:
'CSMD3p.T1137': 3, 'FAM123Ap.Y550': 1, 'SMARCB1p.D192': 1
这一切都很顺利,但我现在想要遍历多个文件,并将所有找到的频率加在一起。所以如果我找到了CSMD3p.T1137'值2次,我想存储' CSMD3p.T1137':5。
wanted output:
totalFreqs = 'FAM123Ap.Y550': 1, 'SMARCB1p.D192': 1, 'CSMD3p.T1137': 5, 'TRPM1p.R551': 2
totalFreqsSorted = 'CSMD3p.T1137': 5,'TRPM1p.R551': 2 'FAM123Ap.Y550': 1, 'SMARCB1p.D192': 1'
我如何"添加" python中字典的关键值? (如何正确归档totalFreqs和totalFreqsSorted的值)
答案 0 :(得分:0)
对所有计数使用一个 Counter()
对象,并为每个文件更新它:
freqs = Counter()
for file in files:
with open(...) as f:
#
freqs.update(mutationList)
或者您只需将它们相加就可以添加计数器:
total_freqs = Counter()
for file in files:
with open(...) as f:
#
freqs = Counter(mutationList)
total_freqs += freqs
请注意,Counter()
个对象已经提供反向排序的频率列表;只需使用Counter.most_common()
method而不是自己排序:
sortedFreqs = freqs.most_common()