在下面的代码中使用词典理解我试图计算带有重复项的单词总数,但这导致{'count': 1, 'words.As': 1, 'said': 1, 'file.\n': 1, 'this': 1, 'text': 1, 'is': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, 'recognize': 1, 'test': 1, 'the': 1, 'repeat': 1, 'before': 1}
我没有看到is
两次或其中任何一件事我在这里做错了什么?
test_readme.txt
Hi this is some text to recognize the count of words.As said before this is only a test file ,i repeat test file.
with open('test_readme.txt') as f:
di = { w : di[w]+1 if w in di else 1 for l in f for w in l.split(' ')}
print di
答案 0 :(得分:2)
你不能使用字典理解。因为di
在创建期间不会发生变化,如果您尚未定义字典,则代码将引发NameError
。
>>> s = """Hi this is some text to recognize the count of words.
... As said before this is only a test file ,i repeat test file."""
>>>
>>> di = { w : di[w]+1 if w in di else 1 for l in s.split('\n') for w in l.split(' ')}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
NameError: global name 'di' is not defined
您可以使用defaultdict()
模块中的Counter()
或collections
:
from collections import defaultdict
di = defaultdict(int)
with open('test_readme.txt') as f:
for line in f:
for w in line.strip().split():
di[w]+=1
演示:
>>> for line in s.split('\n'):
... for w in line.strip().split():
... di[w]+=1
...
>>> di
defaultdict(<type 'int'>, {'count': 1, 'a': 1, 'said': 1, 'words.': 1, 'this': 2, 'text': 1, 'is': 2, 'of': 1, 'some': 1, 'only': 1, ',i': 1, 'to': 1, 'As': 1, 'Hi': 1, 'file': 1, 'recognize': 1, 'test': 2, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})
>>>
答案 1 :(得分:2)
在填充di
时,您无法访问Counter
。
相反,只需使用from collections import Counter
counter = Counter()
with open('test_readme.txt') as f:
for line in f:
counter += Counter(line.split())
"presentation.launchBehavior.newWindow"
答案 2 :(得分:1)
我会在整个字符串上使用counter:
from collections import Counter
with open('readme.txt') as f:
s = Counter(f.read().replace('\n', '').split(' '))
#Out[8]: Counter({'this': 2, 'is': 2, 'test': 2, 'count': 1, 'words.As': 1, 'said': 1, 'text': 1, 'of': 1, 'some': 1, ',i': 1, 'to': 1, 'only': 1, 'Hi': 1, 'a': 1, 'file': 1, '
#recognize': 1, 'the': 1, 'file.': 1, 'repeat': 1, 'before': 1})
答案 3 :(得分:1)
一个非常易读的解决方案是
Thedict = {}
fo = open('sample.txt')
for line in fo:
for word in line.split(' '):
word = word.strip('.').strip()
if(word in Thedict):
Thedict[word] = Thedict[word] + 1
else:
Thedict[word] = 0
print(Thedict)
考虑样本保存文本
答案 4 :(得分:1)
又一个Counter
解决方案,使用嵌套的生成器表达式迭代地一次调用Counter
来运行文件:
from collections import Counter
with open('test_readme.txt') as f:
counts = Counter(word for line in f for word in line.strip().split())
正如已经指出的那样,您无法访问生成要分配的结果的表达式中的变量,或者换言之,表达式的中间结果。首先计算表达式,然后对结果执行存储。由于字典理解是单个表达式,因此会对其进行评估并存储结果。