如何计算python中段落中的句子数量

时间:2013-03-05 15:44:42

标签: python

这是我迄今为止所拥有的,但我的pparagraph只包含5个句号,因此只有5个句子。但它仍然以14个作为答案返回。谁能帮忙?

file = open ('words.txt', 'r')
lines= list (file)
file_contents = file.read()
print(lines)
file.close()
words_all = 0
for line in lines:
    words_all = words_all + len(line.split())
    print ('Total words:   ', words_all)
full_stops = 0
for stop in lines:
    full_stops = full_stops + len(stop.split('.'))
print ('total stops:    ', full_stops)

这是txt文件

车床是一种操纵胶带上符号的设备 根据规则表。尽管它很简单,但图灵机可以 适用于模拟任何计算机算法的逻辑,尤其如此 用于解释计算机内CPU的功能。 “图灵” 1936年,阿兰·图灵(Alan Turing)描述了这台机器 “一个(utomatic)-machine”。图灵机不是实用的 计算技术,而是作为代表一个假设的设备 计算机。图灵机帮助计算机科学家理解 机械计算的极限。

4 个答案:

答案 0 :(得分:4)

如果某行不包含句点,split将返回单个元素:该行本身:

>>> "asdasd".split('.')
    ['asdasd']

所以你要计算行数加周期数。你为什么要把文件拆分成行?

with open('words.txt', 'r') as file:
    file_contents = file.read()

    print('Total words:   ', len(file_contents.split()))
    print('total stops:    ', file_contents.count('.'))

答案 1 :(得分:2)

使用正则表达式。

In [13]: import re
In [14]: par  = "This is a paragraph? So it is! Ok, there are 3 sentences."
In [15]: re.split(r'[.!?]+', par)
Out[15]: ['This is a paragraph', ' So it is', ' Ok, there are 3 sentences', '']

答案 2 :(得分:1)

最简单的方法是:

import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize

sentences = 'A Turning machine is a device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer. The "Turing" machine was described by Alan Turing in 1936, who called it an "a(utomatic)-machine". The Turing machine is not intended as a practical computing technology, but rather as a hypothetical device representing a computing machine. Turing machines help computer scientists understand the limits of mechaniacl computation.'

number_of_sentences = sent_tokenize(sentences)

print(len(number_of_sentences))

输出:

5

答案 3 :(得分:0)

尝试

print "total stops: ", open('words.txt', 'r').read().count(".")

详细说明:

with open("words.txt") as f:
    data = f.read()
    print "total stops: ", data.count(".")