将段落拆分成句子

时间:2015-09-21 12:56:04

标签: javascript python regex

我正在使用以下Python代码(我刚才在网上找到)将段落分成句子。

def splitParagraphIntoSentences(paragraph):
  import re
  sentenceEnders = re.compile(r"""
      # Split sentences on whitespace between them.
      (?:               # Group for two positive lookbehinds.
        (?<=[.!?])      # Either an end of sentence punct,
      | (?<=[.!?]['"])  # or end of sentence punct and quote.
      )                 # End group of two positive lookbehinds.
      (?<!  Mr\.   )    # Don't end sentence on "Mr."
      (?<!  Mrs\.  )    # Don't end sentence on "Mrs."
      (?<!  Jr\.   )    # Don't end sentence on "Jr."
      (?<!  Dr\.   )    # Don't end sentence on "Dr."
      (?<!  Prof\. )    # Don't end sentence on "Prof."
      (?<!  Sr\.   )    # Don't end sentence on "Sr."."
    \s+               # Split on whitespace between sentences.
    """, 
    re.IGNORECASE | re.VERBOSE)
  sentenceList = sentenceEnders.split(paragraph)
  return sentenceList

我的工作正常,但是现在我需要Javascript中的完全相同的正则表达式(以确保输出是一致的)并且我很难将这个Python正则表达式转换成与Javascript兼容的正则表达式。

1 个答案:

答案 0 :(得分:2)

它不是直接拆分的正则表达式,而是一种解决方法:

(?!Mrs?\.|Jr\.|Dr\.|Sr\.|Prof\.)(\b\S+[.?!]["']?)\s

DEMO

您可以将匹配的片段替换为例如:$1#(或其他未在文本中出现的字符,而不是#),然后将其与# DEMO分开。 然而,这不是太优雅的解决方案。