使用字典理解计算文件中的单词数 - python

时间:2016-06-13 08:52:11

标签: python

在下面的代码中使用词典理解我试图计算带有重复项的单词总数,但这导致{'count': 1, 'words.As': 1, 'said': 1, 'file.\n': 1, 'this': 1, 'text': 1, 'is': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, 'recognize': 1, 'test': 1, 'the': 1, 'repeat': 1, 'before': 1}

我没有看到is两次或其中任何一件事我在这里做错了什么?

test_readme.txt

Hi this is some text to recognize the count of words.As said before this is only a test file ,i repeat test file.

with open('test_readme.txt') as f:
   di = { w : di[w]+1 if w in di else 1  for l in f for w in l.split(' ')}
print di

5 个答案:

答案 0 :(得分:2)

你不能使用字典理解。因为di在创建期间不会发生变化,如果您尚未定义字典,则代码将引发NameError

>>> s = """Hi this is some text to recognize the count of words.
... As said before this is only a test file ,i repeat test file."""
>>> 
>>> di = { w : di[w]+1 if w in di else 1 for l in s.split('\n') for w in l.split(' ')}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <dictcomp>
NameError: global name 'di' is not defined

您可以使用defaultdict()模块中的Counter()collections

from collections import defaultdict

di = defaultdict(int)
with open('test_readme.txt') as f:
   for line in f:
       for w in line.strip().split():
           di[w]+=1

演示:

>>> for line in s.split('\n'):
...    for w in line.strip().split():
...            di[w]+=1
... 
>>> di
defaultdict(<type 'int'>, {'count': 1, 'a': 1, 'said': 1, 'words.': 1, 'this': 2, 'text': 1, 'is': 2, 'of': 1, 'some': 1, 'only': 1, ',i': 1, 'to': 1, 'As': 1, 'Hi': 1, 'file': 1, 'recognize': 1, 'test': 2, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})
>>> 

答案 1 :(得分:2)

在填充di时,您无法访问Counter

相反,只需使用from collections import Counter counter = Counter() with open('test_readme.txt') as f: for line in f: counter += Counter(line.split())

即可
"presentation.launchBehavior.newWindow"

答案 2 :(得分:1)

我会在整个字符串上使用counter:

from collections import Counter

with open('readme.txt') as f:
   s = Counter(f.read().replace('\n', '').split(' '))

#Out[8]: Counter({'this': 2, 'is': 2, 'test': 2, 'count': 1, 'words.As': 1, 'said': 1, 'text': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, '
#recognize': 1, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})

答案 3 :(得分:1)

一个非常易读的解决方案是

Thedict = {}
fo = open('sample.txt')
for line in fo:
    for word in line.split(' '):
        word = word.strip('.').strip()
        if(word in Thedict):
            Thedict[word] = Thedict[word] + 1
        else:
            Thedict[word] = 0

print(Thedict)

考虑样本保存文本

答案 4 :(得分:1)

又一个Counter解决方案,使用嵌套的生成器表达式迭代地一次调用Counter来运行文件:

from collections import Counter

with open('test_readme.txt') as f:
    counts = Counter(word for line in f for word in line.strip().split())

正如已经指出的那样,您无法访问生成要分配的结果的表达式中的变量,或者换言之,表达式的中间结果。首先计算表达式,然后对结果执行存储。由于字典理解是单个表达式,因此会对其进行评估并存储结果。