Python:文件中的单词频率

时间:2017-04-07 07:38:46

标签: python-2.7

我在python中创建了一个简单的字数统计程序,它读取文本文件,计算字频率并将结果写入另一个文件。 现在的问题是,如果我想搜索“windows”并且文本文件包含单词“xwindows”,那么它也会计算它。

import sys
import glob
import errno
files = glob.glob('w.asm')
the_count =['windows']
for name in files:
    with open(name) as f:
        print "Occurences in file -- %s " % name
        contents = f.read()
        print contents
        for number in the_count:
            print "windows occured-", contents.count(number)

w.asm文件包含

windows
iwindows
qwindows
hwindows
kwindows
windows
windows
windowsh
wwindows
windows
iwindows
qwindows
hwindows
kwindows

输出

Occurences in file -- w.asm 

windows
iwindows
qwindows
hwindows
kwindows
windows
windows
windowsh
wwindows
windows
iwindows
qwindows
hwindows
kwindows
windows occured- 14

所以我想要的实际输出是4,因为窗口实际上发生了4次,但是代码给出了14 ....

所以请帮忙

1 个答案:

答案 0 :(得分:0)

14实际上是正确的,因为windowsh等包含子串winows。一个简单的解决方法是首先按文字拆分文件,然后调用count()

for name in files:
    with open(name) as f:
        print "Occurences in file -- %s " % name
        contents = f.read().split() # <--- split
        print contents
        for number in the_count:
            print "windows occured-", contents.count(number)