我一直在尝试创建一个脚本,在文本文件中搜索模式,计算它发生的次数,然后将其作为键值对插入到字典中。
以下是代码:
fname = raw_input("File name: ")
import re
vars = dict()
lst= list()
count = 0
try:
fhand = open(fname, "r+")
except:
print "File not found"
quit()
for line in fhand:
line.rstrip()
if re.search(pattern , line):
x = re.findall(pattern , line)
lst.append(x)
else:
continue
for x in lst:
count += 1
从正则表达式方法中提取文本并将其插入字典以使其看起来像这样的最佳方法是什么:
{'pattern' : count, 'pattern' : count, 'pattern' : count}
答案 0 :(得分:0)
你的意思是这样吗?
import re
pattern1 = r'([a-z]+)'
pattern2 = r'([0-9])'
regex1 = re.compile(pattern1)
regex2 = re.compile(pattern2)
filename = "somefile.txt"
d = dict()
with open(filename, "r") as f:
for line in f:
d[pattern1] = d.get(pattern1, 0) + len(regex1.findall(line));
d[pattern2] = d.get(pattern2, 0) + len(regex2.findall(line));
print d
# output: {'([0-9])': 9, '([a-z]+)': 23}
答案 1 :(得分:0)
您可以这样做:
fhand = ["<abc> <abc>", "<abc>", "<d>"]
counts = {}
pattern = re.compile(r'<\w+>') # insert your own regex here
for line in fhand:
for match in pattern.findall(line):
# initialize the count for this match to 0 if it does not yet exist
counts.setdefault(match, 0)
counts[match] += 1
给出
counts = {'<abc>': 3, '<d>': 1}
答案 2 :(得分:0)
首先,我会使用with
打开您的文件,而不仅仅是open
。
例如:
with open(fname, "r+") as fhand:
另外,我认为你误解了词典的重点。它们是键/值存储,意味着每个键都是唯一的。你不能拥有多个密钥。
我认为更好的解决方案如下:
import collections
for line in fhand:
line.rstrip()
if re.search(pattern , line):
x = re.findall(pattern , line)
lst.append(x)
else:
continue
counted = collections.Counter(lst)
print counted
这将返回包含列表的键/值出现的字典