我有一组文件,我想在其中提取特殊名称并计算每个文件中的名称。我希望我的最终结果是两个词典,如下所示:
{ID1:{sam:1,maj:5, tif:7, paul:1},ID2={maj=4,bib=5},..}
我为此编写了以下代码:
val={}
for m in result:
f= open(path+m[1]+'.txt', 'r')
for line in f:
search_str= "my_name"
if line.startswith(search_str):
linename = line.split(' ',2)[1].strip()
key= get_name_part(linename)
val[key] = val.get(key, 0) + 1
maindict[m[0]]=val
其中m [0]是'fileID'(我的大字典的键),m [1] =是必须打开的文件。
运行代码时,我的内部字典总是相同的,但只有外部字典的键不同。就像这样:
{ID1:{sam:1,maj:5, tif:7, paul:1},ID2={sam:1,maj:5, tif:7, paul:1},..}
任何人都知道如何解决它?
答案 0 :(得分:3)
您永远不会创建新 val
字典,只是不断更新循环之前创建的字典。为每个ID
创建一个新的:
maindict = {}
for m in result:
val = maindict.setdefault(m[0], {})
f= open(path+m[1]+'.txt', 'r')
for line in f:
search_str= "my_name"
if line.startswith(search_str):
linename = line.split(' ',2)[1].strip()
key= get_name_part(linename)
val[key] = val.get(key, 0) + 1
您可以使用collections.Counter
和collections.defaultdict
:
from collections import Counter, defaultdict
import os
maindict = defaultdict(Counter)
for m in result:
counts = maindict[m[0]]
with open(os.path.join(path, m[1] + '.txt'), 'r') as f:
search_str = "my_name"
counts.update(get_name_part(line.split(None, 2)[1])
for line in f if line.startswith(search_str))