Question

在下面的代码中使用词典理解我试图计算带有重复项的单词总数，但这导致{'count': 1, 'words.As': 1, 'said': 1, 'file.\n': 1, 'this': 1, 'text': 1, 'is': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, 'recognize': 1, 'test': 1, 'the': 1, 'repeat': 1, 'before': 1}

我没有看到is两次或其中任何一件事我在这里做错了什么？

test_readme.txt

Hi this is some text to recognize the count of words.As said before this is only a test file ,i repeat test file.

with open('test_readme.txt') as f:
   di = { w : di[w]+1 if w in di else 1  for l in f for w in l.split(' ')}
print di

Answer 1

你不能使用字典理解。因为di在创建期间不会发生变化，如果您尚未定义字典，则代码将引发NameError。

>>> s = """Hi this is some text to recognize the count of words.
... As said before this is only a test file ,i repeat test file."""
>>> 
>>> di = { w : di[w]+1 if w in di else 1 for l in s.split('\n') for w in l.split(' ')}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <dictcomp>
NameError: global name 'di' is not defined

您可以使用defaultdict()模块中的Counter()或collections：

from collections import defaultdict

di = defaultdict(int)
with open('test_readme.txt') as f:
   for line in f:
       for w in line.strip().split():
           di[w]+=1

演示：

>>> for line in s.split('\n'):
...    for w in line.strip().split():
...            di[w]+=1
... 
>>> di
defaultdict(<type 'int'>, {'count': 1, 'a': 1, 'said': 1, 'words.': 1, 'this': 2, 'text': 1, 'is': 2, 'of': 1, 'some': 1, 'only': 1, ',i': 1, 'to': 1, 'As': 1, 'Hi': 1, 'file': 1, 'recognize': 1, 'test': 2, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})
>>>

Answer 2

在填充di时，您无法访问Counter。

相反，只需使用from collections import Counter counter = Counter() with open('test_readme.txt') as f: for line in f: counter += Counter(line.split())

即可

"presentation.launchBehavior.newWindow"

Answer 3

我会在整个字符串上使用counter：

from collections import Counter

with open('readme.txt') as f:
   s = Counter(f.read().replace('\n', '').split(' '))

#Out[8]: Counter({'this': 2, 'is': 2, 'test': 2, 'count': 1, 'words.As': 1, 'said': 1, 'text': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, '
#recognize': 1, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})

Answer 4

一个非常易读的解决方案是

Thedict = {}
fo = open('sample.txt')
for line in fo:
    for word in line.split(' '):
        word = word.strip('.').strip()
        if(word in Thedict):
            Thedict[word] = Thedict[word] + 1
        else:
            Thedict[word] = 0

print(Thedict)

考虑样本保存文本

Answer 5

又一个Counter解决方案，使用嵌套的生成器表达式迭代地一次调用Counter来运行文件：

from collections import Counter

with open('test_readme.txt') as f:
    counts = Counter(word for line in f for word in line.strip().split())

正如已经指出的那样，您无法访问生成要分配的结果的表达式中的变量，或者换言之，表达式的中间结果。首先计算表达式，然后对结果执行存储。由于字典理解是单个表达式，因此会对其进行评估并存储结果。

使用字典理解计算文件中的单词数 - python

5 个答案: