Question

我想计算在文本文件中找到每个单词的次数，并且不确定是什么问题。当我运行它时，我得到的计数为0.我也很难找到一种方法在计数中包括单词大写的位置（计算狗和狗的出现次数）

def main():
text_file = open("textfile.txt", "r")

dog_count = 0
cat_count = 0

for word in text_file.readlines():
    if word == 'dog':
        dog_count= dog_count + 1
    else:
        dog_count= dog_count

print('the word dog occurs',dog_count,'times')

Answer 1

我相信你的问题是你正在循环文件的行而不是单词。您需要添加另一个循环来遍历每个单词。

警告：以下示例未经测试，但应足够接近。

def main():
    text_file = open("textfile.txt", "r")

    dog_count = 0
    cat_count = 0

    for line in text_file.readlines():
        for word in line.split():
            if word == 'dog':
                dog_count= dog_count + 1

    print('the word dog occurs',dog_count,'times')

Answer 2

您可以在搜索过程中将文本设置为大写/小写：

def main（）： text_file = open（＆＃34; textfile.txt＆＃34;，＆＃34; r＆＃34;）

dog_count = 0
cat_count = 0

for line in text_file.readlines():
    for word in line.split():
        word = word.lower() #case convertion
        if word == 'dog':
            dog_count= dog_count + 1

print "The word dog occurs",dog_count,"times"

main（）的

它应该工作正常，测试和工作正常对我来说。：）

Answer 3

答案：关于“为什么输出错误”的问题 - 您需要遍历行中的每个单词。

<强>建议：当您搜索多个单词时，可以将它们放在dict中并将计数存储为相应的dict键的值。

档案内容：

Hi this is hello
Hello is my name

然后

text_file.read()

会给，

['Hi this is hello\n', 'Hello is my name\n']

text_file.read().splitlines()
['Hi this is hello', 'Hello is my name']

然后分割线条中的每一行

lines = map(str.split,text_file.read().splitlines())
[['Hi', 'this', 'is', 'hello'], ['Hello', 'is', 'my', 'name']]

在链接可迭代时，

it.chain.from_iterable(map(str.split,text_file.read().splitlines()))
['Hi', 'this', 'is', 'hello', 'Hello', 'is', 'my', 'name']

和

search=['dog','cat'] # the words that you need count
search = dict.fromkeys(search,0) # will give a dict as {'dog':0,'cat':0}

因此，对于您的问题，

def main():
        text_file =  open("textfile.txt", "r")
        search=['cat','dog']
        search = dict.fromkeys(search,0)
        import itertools as it
        res=dict()
        for word in it.chain.from_iterable(map(str.split,text_file.read().splitlines())):
                if word.lower() in search:
                        search[word.lower()]=search[word.lower()]+1
        for word,count in search.iteritems():
                print('the word %s occurs %d times'%(word,count))

这也得到了大小写敏感词的数量！

希望它有所帮助！

计算文本文件中单词出现次数

3 个答案: