Question

我想计算结束句子＆＃39;例如句号，感叹号和问号。

我写了一个小循环来做这件事，但我想知道是否有更好的方法。不允许使用内置函数。

for line in textContent:
    numberOfFullStops += line.count(".")
    numberOfQuestionMarks += line.count("?")
    numberOfQuestionMarks += line.count("!")

numberOfSentences = numberOfFullStops + numberOfQuestionMarks + numberOfExclamationMarks

Answer 1

假设您想在一个句子中计算终端标点符号，我们可以通过循环遍历每个字符串的字符并过滤标点来生成（字符，计数）字典。

<强>演示

以下是自上而下的三个选项，包括中级到初级数据结构：

import collections as ct


sentence = "Here is a sentence, and it has some exclamations!!"
terminals = ".?!"

# Option 1 - Counter and Dictionary Comprehension
cd = {c:val for c, val in ct.Counter(sentence).items() if c in terminals}
cd
# Out: {'!': 2}


# Option 2 - Default Dictionary
dd = ct.defaultdict(int)
for c in sentence:
    if c in terminals:
        dd[c] += 1
dd
# Out: defaultdict(int, {'!': 2})


# Option 3 - Regular Dictionary
d = {}
for c in sentence:
    if c in terminals:
        if c not in d:
            d[c] = 0
        d[c] += 1
d
# Out: {'!': 2}

要进一步扩展，对于单独的sentences列表，请围绕后一个选项之一循环。

for sentence in sentences:
    # add option here

注意：要将每个句子的总标点符号相加，请dict.values()总计，例如sum(cd.values())。

更新：假设您要按终端标点分割句子，请使用正则表达式：

import re


line = "Here is a string of sentences.  How do we split them up?  Try regular expressions!!!"


# Option - Regular Expression and List Comprehension
pattern = r"[.?!]"
sentences = [sentence for sentence in re.split(pattern, line) if sentence]
sentences
# Out: ['Here is a string of sentences', '  How do we split them up', '  Try regular expressions']

len(sentences)
# Out: 3

注意line有5个终端，但只有3个句子。因此，正则表达式是一种更可靠的方法。

<强>参考

是否有更好的方法来计算句子中的标点符号？

1 个答案: