是否有更好的方法来计算句子中的标点符号?

时间:2017-10-21 15:26:32

标签: python counting punctuation

我想计算结束句子'例如句号,感叹号和问号。

我写了一个小循环来做这件事,但我想知道是否有更好的方法。不允许使用内置函数。

for line in textContent:
    numberOfFullStops += line.count(".")
    numberOfQuestionMarks += line.count("?")
    numberOfQuestionMarks += line.count("!")

numberOfSentences = numberOfFullStops + numberOfQuestionMarks + numberOfExclamationMarks

1 个答案:

答案 0 :(得分:0)

假设您想在一个句子中计算终端标点符号,我们可以通过循环遍历每个字符串的字符并过滤标点来生成(字符,计数)字典。

<强>演示

以下是自上而下的三个选项,包括中级到初级数据结构:

import collections as ct


sentence = "Here is a sentence, and it has some exclamations!!"
terminals = ".?!"

# Option 1 - Counter and Dictionary Comprehension
cd = {c:val for c, val in ct.Counter(sentence).items() if c in terminals}
cd
# Out: {'!': 2}


# Option 2 - Default Dictionary
dd = ct.defaultdict(int)
for c in sentence:
    if c in terminals:
        dd[c] += 1
dd
# Out: defaultdict(int, {'!': 2})


# Option 3 - Regular Dictionary
d = {}
for c in sentence:
    if c in terminals:
        if c not in d:
            d[c] = 0
        d[c] += 1
d
# Out: {'!': 2}

要进一步扩展,对于单独的sentences列表,请围绕后一个选项之一循环。

for sentence in sentences:
    # add option here

注意:要将每个句子的总标点符号相加,请dict.values()总计,例如sum(cd.values())

更新假设您要按终端标点分割句子,请使用正则表达式:

import re


line = "Here is a string of sentences.  How do we split them up?  Try regular expressions!!!"


# Option - Regular Expression and List Comprehension
pattern = r"[.?!]"
sentences = [sentence for sentence in re.split(pattern, line) if sentence]
sentences
# Out: ['Here is a string of sentences', '  How do we split them up', '  Try regular expressions']

len(sentences)
# Out: 3

注意line有5个终端,但只有3个句子。因此,正则表达式是一种更可靠的方法。

<强>参考