单词

Question

我正在尝试制作一个Python字计数器，用于对输入到字典中的文件中的单词进行计数。但是，我的柜台只计算一次该单词，我不确定为什么。另外，有没有一种方法不使用集合计数器？

cloud = {}
val = 0
with open('objects.txt', 'r') as file:
    for line in file:
        for thing in line:
            new_thing = thing.strip(' ')
            cloud[new_thing] = val
            for new_thing in cloud:
                cloud[new_thing] = cloud.get(new_thing, val) + 1

Answer 1

在代码中，为每个新行设置

cloud[new_thing] = 0

这将重置单词new_thing的计数器。

由于您已经使用cloud.get(new_thing, 0)，如果未找到密钥0，它将返回new_thing，因此只需删除该行即可。

Answer 2

除了其他人指出的那样，除了将每个“ new_thing”的值初始化为0（cloud[new_thing] = 0）之外，还有另一个主要问题：您尝试在cloud上进行迭代，然后再向其中添加任何元素它（因此for new_thing in cloud:及其块实际上什么也不做，因为cloud为空）。这是不必要的，因为字典是非顺序访问的。

您可以替换

new_thing = thing.strip(string.punctuation)
cloud[new_thing] = 0
for new_thing in cloud:
    cloud[new_thing] = cloud.get(new_thing, 0) + 1

仅：

new_thing = thing.strip(string.punctuation)
cloud[new_thing] = cloud.get(new_thing, 0) + 1

或使用collections.Counter，正如其他人所建议的那样，它已经可以完成您要完成的任务，并且可能会使您的工作变得更容易。

Answer 3

您可以使用python字典的setdefault函数

for new_thing in cloud:
                count = cloud.setdefault(new_thing, 0)
                cloud[new_thing] = count + 1

Answer 4

我将提取将文件分成行和字并去除标点符号的部分：

def strip_punctuation(lines):
    for line in lines:
        for word in line:
            yield word.strip(string.punctuation)


with open('objects.txt', 'r') as file:
    cloud = collections.Counter(strip_punctuation(file))

或者使用itertools.chain和map更简洁：

with open('objects.txt', 'r') as file:
    words = itertools.chain.from_iterable(file)
    words_no_punctuation = map(lambda x: x.strip(string.punctuation))
    cloud = collections.Counter(words_no_punctuation)

单词

PS：for thing in line:不会将行分隔为单词，而是以字符分隔。我猜你是说for thing in line.split():

然后最后一个选项变为：

with open('objects.txt', 'r') as file:
    words_per_line = map(lambda line: line.split(), file)
    words = itertools.chain.from_iterable(words_per_line)
    words_no_punctuation = map(lambda x: x.strip(string.punctuation))
    cloud = collections.Counter(words_no_punctuation)

Python Word Counter仅计数一次单词

4 个答案:

单词