Question

我想知道，如何读取像fscanf这样的字符串。我需要在所有.txt中读取单词。 我需要对每个单词进行计数。

collectwords = collections.defaultdict(int)

with open('DatoSO.txt', 'r') as filetxt:

for line in filetxt:
    v=""
    for char in line:
        if str(char) != " ":
          v=v+str(char)

        elif str(char) == " ":
          collectwords[v] += 1
          v=""

这样，我无法读完最后一句话。

Answer 1

嗯，这样吗？

with open('DatoSO.txt', 'r') as filetxt:
    for line in filetxt:
        for word in line.split():
            collectwords[word] += 1

Answer 2

如果您使用的是Python＆gt; = 2.7

，也可以考虑使用collections.counter

http://docs.python.org/library/collections.html#collections.Counter

它添加了许多方法，例如'most_common'，这在这种类型的应用程序中可能很有用。

来自Doug Hellmann的PyMOTW：

import collections

c = collections.Counter()
with open('/usr/share/dict/words', 'rt') as f:
    for line in f:
        c.update(line.rstrip().lower())

print 'Most common:'
for letter, count in c.most_common(3):
    print '%s: %7d' % (letter, count)

http://www.doughellmann.com/PyMOTW/collections/counter.html - 虽然这会影响字数而不是字数。在c.update行中，您可能希望将line.rstrip().lower替换为line.split()，并且可能需要使用一些代码来删除标点符号。

编辑：要删除标点，这可能是最快的解决方案：

import collections
import string

c = collections.Counter()
with open('DataSO.txt', 'rt') as f:
    for line in f:
        c.update(line.translate(string.maketrans("",""), string.punctuation).split())

（借鉴以下问题Best way to strip punctuation from a string in Python）

Answer 3

Python使这很简单：

collectwords = []
filetxt = open('DatoSO.txt', 'r')

for line in filetxt:
  collectwords.extend(line.split())

从.text中读取单词，并计算每个单词

3 个答案: