计算每个段落的平均句子

时间:2019-05-02 04:00:17

标签: python dictionary

我有一个文件,我需要从中计算每个句子的平均单词数和每个段落的平均句子数,其中一个段落是任意数量的句子,后跟空白行或文本结尾,另外,句子是一个单词序列,后跟一个句号,逗号或感叹号,然后又必须带引号(因此该句子是引号或口头话的结尾)或空白(空格,制表符或换行符)。请不要使用正则表达式之类的任何模块。

我尝试使用以下代码,但有时会显示zerdivision错误,但我知道了如何找到一个句子的平均单词数,但在上述条件下查找每个段落的平均句子数仍然遇到问题。

with open("/Users/abhishekabhishek/downloads/l.txt") as f:
      # calculate number of lines for that text
      total_number_of_sent = f.read().split()
      for line in total_number_of_sent:
             total_sum += sum(line.split())
      average_number_sentences = total_sum/total_number_of_sent
# this is just a sample sentence from the file not the whole file
sample_sentence = "I am not a proper archM-CM-&ologist nor an      anthropologist nor an ethnologist"
# it gives me the average word count for that sentence
average_word_count = sum(len(word) for word in        f.read().split())/len(f.read().split())
print(round(average_word_count, 2))

它应该返回这样的内容 这些只是样本值,当我们将获得正确的结果而不是实际答案时应该是什么样子,因为我不知道实际数字是多少,因为我还没有计算出任何我只是在样本句子中尝试过的值。 {“ avg_words_per_sentence”:5.42,“ avg_sentences_per_paragraph”:8}

这是我希望能做的文字段落之一。

Phasellus fringilla luctus magna, a finibus justo dapibus a. Nam risus felis, rhoncus eget diam sit amet, congue facilisis nibh. Interdum et malesuada fames ac ante ipsum primis in faucibus. Praesent consequat euismod diam, eget volutpat magna convallis at. Mauris placerat pellentesque imperdiet. Nulla porta scelerisque enim, et scelerisque neque bibendum in. Proin eget turpis nisi. Suspendisse ut est a erat egestas eleifend at euismod arcu. Donec aliquet, nisi sed faucibus condimentum, nisi metus dictum eros, nec dignissim justo odio id nulla. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Maecenas sollicitudin, justo id elementum eleifend, justo neque aliquet nibh, finibus malesuada metus erat eget neque. Suspendisse nec auctor orci. Aenean et vestibulum nulla. Nullam hendrerit augue tristique, commodo metus id, sodales lorem. Etiam feugiat dui est, vitae auctor risus convallis non.

0 个答案:

没有答案