Question

我希望打开.txt文件，并将文件中的所有字词输入dictionary。之后我想累积字典中的单词总和。

.txt文件包含5行：

elephant calculator fish
towel onion fish
nandos pigeon tiger
cheeky peg lion
dog cat fish

这就是我现在所拥有的：

words = 0 
dictionary = []
with open('file.txt','r') as file:
    for x in inf:
        dictionary.split(x)
        words += 1
print(words)

抱歉这个构造得很糟糕的问题。

Answer 1

获取唯一单词计数的简单方法是使用set。我把你的文本放到一个名为'qdata.txt'的文件中。

该文件非常小，因此无需逐行读取：只需将整个内容读入单个字符串，然后将该字符串拆分为空格并将结果列表传递给set构造函数：

fname = 'qdata.txt'
with open(fname) as f:
    words = set(f.read().split())
print(words, len(words))

<强>输出

set(['towel', 'onion', 'nandos', 'calculator', 'pigeon', 'dog', 'cat', 'tiger', 'lion', 'cheeky', 'elephant', 'peg', 'fish']) 13

这是因为“set对象是不同的hashable对象的无序集合”。如果您尝试将重复项添加到集中，则会忽略它。有关详细信息，请参阅文档。

对于较大的文件，是一个好主意，逐行读取和处理它们以避免将整个文件加载到RAM中，但是对于现代操作系统，文件需要相当大才能看到由于文件缓存而带来的任何好处。

fname = 'qdata.txt'
words = set()
with open(fname) as f:
    for line in f:
        words.update(line.split())

print(words, len(words))

Answer 2

你有几个问题，但基本策略是健全的

dictionary实际上是一个列表......无论如何，这都是你想要的。重命名。
您将文件打开为file，这在Python 3中很好，但在Python 2中不赞成，因为它掩盖了内置的file对象。人们仍然对此敏感，所以最好使用不同的名称。
您没有使用文件变量，而是发明了名为inf的内容。
你分错了。您想要拆分从文件中读取的x行。
无需计算单词......列表知道它们有多长。

所以，这会更好用

words = []
with open('file.txt') as fileobj:
    for x in fileobj:
        words += x.strip().split()
print(len(words))

collections.Counter通常用于计算单词的出现次数。假设您可以使用标准库中的任何内容，这将起作用（请注意我降低了大小，以便大象和大象计算相同）：

import collections
words = collections.Counter(int)
with open('file.txt') as fileobj:
    for x in fileobj:
        words.update(word.lower() for word in x.strip().split())
# words is a dict-like object with a count of each word
print(len(words))
print(words)
# lets pick one
print('elephant count', words['elephant'])

Answer 3

这可能是效率低下的，并且从未在这样的情况下使用，但由于我也是新手，我想知道为什么以下内容不能用于删除重复项。

words = []
with open('file.txt') as fileobj:
    for x in fileobj:
        words += x.strip().split()
    for i in words:
        if words.count(i) > 1:
            words.remove(i)
print (len(words))
print (words)

多数代码感谢tdelaney。

打开.txt文件并将每个单词放在字典

3 个答案: