Question

我试图计算文件中出现的几个单词的实例数。

这是我的代码：

": "

代码中的问题是它只计算word1的实例数。如何修复此代码以计算word2和word3？

谢谢！

Answer 1

我认为不是连续阅读和拆分文件，如果你这样做，这段代码会更好：[这样你就可以找到你在文件中找到的任意数量单词的术语频率]

 file=open('my_output' , 'r')
 s=file.read()
 s=s.split()
 w=set(s)
 tf={}
 for i in s:
     tf[i]=s.count(i)
 print(tf)

Answer 2

主要问题是file.read()使用该文件。因此，第二次搜索时最终搜索空文件。最简单的解决方案是读取文件一次（如果它不是太大），然后只搜索以前读取的文本：

#!/usr/bin/env python

with  open('my_output', 'r') as file:
    text =  file.read()

word1 = 'wordA'
print('wordA', text.split().count(word1))
word2 = 'wordB'
print('wordB', text.split().count(word2))
word3 = 'wordC'
print('wordC', text.split().count(word3))

为了提高性能，也可以只拆分一次：

#!/usr/bin/env python

with  open('my_output', 'r') as file:
    split_text =  file.read().split()

word1 = 'wordA'
print('wordA', split_text.count(word1))
word2 = 'wordB'
print('wordB', split_text.count(word2))
word3 = 'wordC'
print('wordC', split_text.count(word3))

使用with还可确保文件在读取后正确关闭。

Answer 3

在您的代码中，文件在第一行中被消耗（耗尽），因此下一行不会返回任何要计数的内容：第一个file.read()读取文件的全部内容并将其作为字符串返回。第二个file.read()没有任何内容可供阅读，只返回空字符串'' - 与第三个file.read()一样。

这是一个应该做你想要的版本：

from collections import Counter

counter = Counter()

with open('my_output', 'r') as file:
    for line in file:
        counter.update(line.split())
print(counter)

你可能需要进行一些预处理（为了摆脱特殊字符以及,和.以及什么不是。）

Counter位于python标准库中，对于那种事情非常有用。

请注意，这样您只能在文件上迭代一次，而且您不必在任何时候将整个文件存储在内存中。

如果您只想跟踪某些单词，则只能选择它们而不是将整行传递给计数器：

from collections import Counter
import string

counter = Counter()

words = ('wordA', 'wordB', 'wordC')
chars_to_remove = str.maketrans('', '', string.punctuation)

with open('my_output', 'r') as file:
    for line in file:
        line = line.translate(chars_to_remove)
        w = (word for word in line.split() if word in words)
        counter.update(w)
print(counter)

我还提供了一个关于预处理的意思的例子：punctuation将在计数之前删除。

Answer 4

你可以试试这个：

file = open('my_output', 'r')

splitFile = file.read().split()

lst = ['wordA','wordB','wordC']

for wrd in lst:
    print(wrd, splitFile.count(wrd))

Answer 5

使用collections.Counter对象的简短解决方案：

import collections

with open('my_output', 'r') as f:    
    wordnames = ('wordA', 'wordB', 'wordC')
    counts = (i for i in collections.Counter(f.read().split()).items() if i[0] in wordnames)
    for c in counts:
        print(c[0], c[1])

对于以下示例文本行：

'wordA some dfasd asdasdword B wordA sdfsd sdasdasdddasd wordB wordC wordC sdfsdfsdf wordA'

我们将获得输出：

wordB 1
wordC 2
wordA 3

Answer 6

from collections import Counter

#Create a empty word_list which stores each of the words from a line.
word_list=[]

#file_handle to refer to the file object
file_handle=open(r'my_file.txt' , 'r+')

#read all the lines in a file
for line in file_handle.readlines():

    #get each line, 
    #split each line into list of words
    #extend those returned words into the word_list

    word_list.extend(line.split())

# close the file object
file_handle.close()

#Pass the word_list to Counter() and get the dictionary of the words
dictionary_of_words=Counter(word_list)

print dictionary_of_words

计算Python文件中的单词数

6 个答案: