计算段落python中每个单词的字母数

时间:2020-07-27 00:52:31

标签: python list counting word letter

如果我需要编写一个读取大段落并打印出该段落中每个长度有多少个单词的函数,我该怎么做?

这是我到目前为止尝试过的。

#split the piece of writing into a list so I can search over every word
#need to find out how to make this not take so long

piece = "hello world"
words = piece.split()

#search over every word, have a variable for each number, check to see length of word and add correspondingly

for word in range(words):
    one = 0
    two = 0
    three = 0
    four = 0
    five = 0
    six = 0
    seven = 0
    eight = 0
    nine = 0
    ten = 0
    eleven = 0
    twelve = 0
    thirteen = 0
    other = 0
    total = 0
    if (len(word) == 1):
        one += 1
        total += 1
    elif (len(word) == 2):
        two += 1
        total += 1
    elif (len(word) == 3):
        three += 1
        total += 1
    elif (len(word) == 4):
        four += 1
        total += 1
    elif (len(word) == 5):
        five += 1
        total += 1
    elif (len(word) == 6):
        six += 1
        total += 1
    elif (len(word) == 7):
        seven += 1
        total += 1
    elif (len(word) == 8):
        eight += 1
        total += 1
    elif (len(word) == 9):
        nine += 1
        total += 1
    elif (len(word) == 10):
        ten += 1
        total += 1
    elif (len(word) == 11):
        eleven += 1
        total += 1
    elif (len(word) == 12):
        twelve += 1
        total += 1
    elif (len(word) == 13):
        thirteen += 1
        total += 1
    else:
        other += 1
        total += 1

#print results
print(f'Proportion of 1- letter words: {one / total * 100}% {one} words')
print(f'Proportion of 2- letter words: {two / total* 100}% {two} words')
print(f'Proportion of 3- letter words: {three / total* 100}% {three} words')
print(f'Proportion of 4- letter words: {four / total * 100}% {four} words')
print(f'Proportion of 5- letter words: {five / total * 100}% {five} words')
print(f'Proportion of 6- letter words: {six / total * 100}% {six} words')
print(f'Proportion of 7- letter words: {seven/ total * 100}% {seven} words')
print(f'Proportion of 8- letter words: {eight / total * 100}% {eight} words')
print(f'Proportion of 9- letter words: {nine / total * 100}% {nine} words')
print(f'Proportion of 10- letter words: {ten / total * 100}% {ten} words')
print(f'Proportion of 11- letter words: {eleven / total * 100}% {eleven} words')
print(f'Proportion of 12- letter words: {twelve / total * 100}% {twelve} words')
print(f'Proportion of 13- letter words: {thirteen / total * 100}% {thirteen} words')

我认为两个问题是我不知道如何使循环在段落的整个长度上运行,我也不知道如何编写代码,以致于文本很大时不会永远跑。

1 个答案:

答案 0 :(得分:4)

  • 尝试避免重复代码。例如,使用字典(例如stats)而不是多个变量会更容易,它增加了每个单词(stats[len(word)] += )的记录
  • Python中包含大量的电池,这可以大大减少您编写的代码量。在这种情况下,defaultdictCounter可能会有所帮助。

应用这些后,您将得到类似的东西

from collections import Counter

stats = Counter(len(word) for word in paragraph.split())
total_words = sum(stats.values())

for length in sorted(stats.keys()):
    print("proportions of %d words: %f" % (length, stats[length] / total_words))

UPD:旁注:在迭代字典时,Python仅使用键。 Counter是字典的子类,因此具有相同的行为。因此,为简洁起见,只使用for length in sorted(stats):是可以的,但是对于不熟悉此Python功能的用户来说,这似乎并不直观。 stats.keys()会得到相同的结果,但结果更加明确。