我在python中创建了一个简单的字数统计程序,它读取文本文件,计算字频率并将结果写入另一个文件。 现在的问题是,如果我想搜索“windows”并且文本文件包含单词“xwindows”,那么它也会计算它。
import sys
import glob
import errno
files = glob.glob('w.asm')
the_count =['windows']
for name in files:
with open(name) as f:
print "Occurences in file -- %s " % name
contents = f.read()
print contents
for number in the_count:
print "windows occured-", contents.count(number)
w.asm文件包含
windows
iwindows
qwindows
hwindows
kwindows
windows
windows
windowsh
wwindows
windows
iwindows
qwindows
hwindows
kwindows
输出
Occurences in file -- w.asm
windows
iwindows
qwindows
hwindows
kwindows
windows
windows
windowsh
wwindows
windows
iwindows
qwindows
hwindows
kwindows
windows occured- 14
所以我想要的实际输出是4,因为窗口实际上发生了4次,但是代码给出了14 ....
所以请帮忙
答案 0 :(得分:0)
14实际上是正确的,因为windowsh
等包含子串winows
。一个简单的解决方法是首先按文字拆分文件,然后调用count()
:
for name in files:
with open(name) as f:
print "Occurences in file -- %s " % name
contents = f.read().split() # <--- split
print contents
for number in the_count:
print "windows occured-", contents.count(number)