Question

我正在尝试使用以下代码获取文本文件中每个单词的计数。

def count_words(file_name):
    with open(file_name, 'r') as f: return reduce(lambda acc, x: acc.get(x, 0) + 1,   sum([line.split() for line in f], []), dict())

但我收到了错误

File "C:\Python27\abc.py", line 173, in count_words
with open(file_name, 'r') as f: return reduce(lambda acc, x: acc.get(x, 0) + 1, sum([line.split() for line in f], []), dict())
File "C:\Python27\abc.py", line 173, in <lambda>
with open(file_name, 'r') as f: return reduce(lambda acc, x: acc.get(x, 0) + 1, sum([line.split() for line in f], []), dict())
AttributeError: 'int' object has no attribute 'get'

我无法理解此处的错误消息。为什么它抱怨'int'没有属性，即使我将dict作为累加器传递？

Answer 1

您可以使用collections.Counter来计算单词：

In [692]: t='I am trying to get the counts of each word in a text file with the below code'
In [693]: from collections import Counter

In [694]: Counter(t.split())
Out[694]: Counter({'the': 2, 'a': 1, 'code': 1, 'word': 1, 'get': 1, 'I': 1, 'of': 1, 'in': 1, 'am': 1, 'to': 1, 'below': 1, 'text': 1, 'file': 1, 'each': 1, 'trying': 1, 'with': 1, 'counts': 1})

In [695]: c=Counter(t.split())

In [696]: c['the']
Out[696]: 2

Answer 2

问题是你的lambda函数返回int，但不是dict。

所以，即使你使用dict作为种子，当第二次调用你的lambda函数时，acc将是第一次调用acc.get(x, 0) + 1的结果，它是int而不是dict。

Answer 3

因此，如果你正在寻找一个单行，我几乎有一个单行的精神，你想要做的事情。

>>> words = """One flew over the ocean
... One flew over the sea
... My Bonnie loves pizza
... but she doesn't love me"""
>>>
>>> f = open('foo.txt', 'w')
>>> f.writelines(words)
>>> f.close()

“单线”（实际上是双线）

>>> word_count = {}
>>> with open('foo.txt', 'r') as f:
...     _ = [word_count.update({word:word_count.get(word,0)+1}) for word in f.read().split()]
...

结果：

>>> word_count
{'but': 1, 'One': 2, 'the': 2, 'she': 1, 'over': 2, 'love': 1, 'loves': 1, 'ocean': 1, "doesn't": 1, 'pizza': 1, 'My': 1, 'me': 1, 'flew': 2, 'sea': 1, 'Bonnie': 1}

我想你可以用字典理解做些什么，但在这种情况下我无法看到如何使用get。

然而，f.read().split()为您提供了一个很好的单词列表，并且应该比尝试从行列表中获取单词更容易。除非你有一个庞大的文件，否则这是一种更好的方法。

Python：使用dict作为累加器

3 个答案: