Question

我有一个有段落的文件。我只想计算每个单词的频率。我用以下方式尝试过它。但我没有得到任何输出。谁能帮帮我吗。

Python,2
is,3
good,1
helps,1
in,2
machine,2
learning,1
learning,1
goos,1
python,1
famous,1
kill,1
the,1
machine,1
it,1
a,1
good,1
day,1

我得到这样的输出。

AddContentItemDialog

Answer 1

从您的代码中，我发现了以下问题

for s in l：l是一行文字，for循环将遍历每个字符，而不是单词
f.split('\n')表达式将生成错误，因为f是文件对象，并且没有.split()方法，字符串

考虑到这一点，这里是重写你的代码以使其有效：

dic = {}
with open("f1.txt" ,'r') as f:
    for l in f:
        for w in l.split():
            dic[w] = dic.get(w,0)+1
print ('\n'.join(['%s,%s' % (k, v) for k, v in dic.items()]))

Answer 2

纯粹的python方式，无需导入任何库。更多代码，但我今天想要编写一些糟糕的代码（：

file = open('path/to/file.txt', 'r')
content = ' '.join(line for line in file.read().splitlines())
content = content.split(' ')
freqs = {}
for word in content:
    if word not in freqs:
        freqs[word] = 1
    else:
        freqs[word] += 1
file.close()

这使用python字典来存储单词及其出现的次数。我知道使用with open(blah) as b:会更好，但这只是为了实现这个想法。 ¯\ _（ツ）_ /¯

Answer 3

您可以使用count方法

mystring = "hello hello hello"
mystring.count("hello")  # 3

使用python计算文件中单词的频率

3 个答案: