我想计算结束句子'例如句号,感叹号和问号。
我写了一个小循环来做这件事,但我想知道是否有更好的方法。不允许使用内置函数。
for line in textContent:
numberOfFullStops += line.count(".")
numberOfQuestionMarks += line.count("?")
numberOfQuestionMarks += line.count("!")
numberOfSentences = numberOfFullStops + numberOfQuestionMarks + numberOfExclamationMarks
答案 0 :(得分:0)
假设您想在一个句子中计算终端标点符号,我们可以通过循环遍历每个字符串的字符并过滤标点来生成(字符,计数)字典。
<强>演示强>
以下是自上而下的三个选项,包括中级到初级数据结构:
import collections as ct
sentence = "Here is a sentence, and it has some exclamations!!"
terminals = ".?!"
# Option 1 - Counter and Dictionary Comprehension
cd = {c:val for c, val in ct.Counter(sentence).items() if c in terminals}
cd
# Out: {'!': 2}
# Option 2 - Default Dictionary
dd = ct.defaultdict(int)
for c in sentence:
if c in terminals:
dd[c] += 1
dd
# Out: defaultdict(int, {'!': 2})
# Option 3 - Regular Dictionary
d = {}
for c in sentence:
if c in terminals:
if c not in d:
d[c] = 0
d[c] += 1
d
# Out: {'!': 2}
要进一步扩展,对于单独的sentences
列表,请围绕后一个选项之一循环。
for sentence in sentences:
# add option here
注意:要将每个句子的总标点符号相加,请dict.values()
总计,例如sum(cd.values())
。
更新:假设您要按终端标点分割句子,请使用正则表达式:
import re
line = "Here is a string of sentences. How do we split them up? Try regular expressions!!!"
# Option - Regular Expression and List Comprehension
pattern = r"[.?!]"
sentences = [sentence for sentence in re.split(pattern, line) if sentence]
sentences
# Out: ['Here is a string of sentences', ' How do we split them up', ' Try regular expressions']
len(sentences)
# Out: 3
注意line
有5个终端,但只有3个句子。因此,正则表达式是一种更可靠的方法。
<强>参考强>