匹配 JavaScript 中每个句子中特定单词的正则表达式应该是什么?

匹配句子的规则很明确: 它应以点(。)结尾,下一个字母应为大写。


这是我用于循环句子的java正则表达式 enter link

这是我在java +5字上下文中匹配单词的java正则表达式: enter link 但是我需要在JavaScript中同时使用它们。




在该市发生地震期间,新西兰的悬崖已经倒塌   基督城在南岛。没有严重的伤害或死亡   在当地时间13点13分发生的情人节地震中报道了这一情况   时间。基于医学。报告每个人都没事。

所选单词“ on ”的输出:

  1. 新西兰克赖斯特彻奇在基督城市的地震已经崩溃 南岛
  3. 基于 on med。报告每个人都没事。

  1. 一种解决方案使用单个正则表达式来尝试解析整个原始段落。可以这样做,但如下所述,可能不是最佳解决方案。

  2. 另一种解决方案是更复杂的算法,但使用更轻的正则表达式。它将文本分成句子并分别处理每个句子。这个解决方案效率更高,我可以说更优雅。

  3. 解决方案1:单一正则表达式


    \. +([A-Z]([^.]|.(?! +[A-Z]))*?" + keyword + "([^.]|.(?! +[A-Z]))*?\.(?= +[A-Z]))



    这是一个相当正则表达式的解决方案。它可能相当慢。使用您提供的示例段落,此例程变得无法忍受地缓慢。即使这么慢,它实际上也不够复杂,因为它无法判断关键字何时嵌入另一个单词中。 (例如,当寻找“猫”时,它也会找到“猫头鹰”)。试图避免这种嵌入是可能的,但它只是让整个事情变得太慢甚至无法演示。

    var text = "I like cats. I really like cats. I also like dogs. Dogs and cats are pets. Approx. half of pets are cats. Approx. half of pets are dogs. Some cats are v. expensive.";
    var keyword = "cats";
    var reStr =
      "\. +"                   + // a preceding sentence-ender, i.e. a period
                                 //   followed by one or more spaces
      "("                      + // begin remembering the match (i.e. arr[1] below)
        "[A-Z]"                + // a sentence-starter, i.e. an uppercase letter
        "("                    + // start of a sentence-continuer, which is either
          "[^.]"               + // anything but a period
          "|"                  + // or
          "\.(?! +[A-Z])"      + // a period not followed by one or more spaces
                                 //   and an uppercase letter
        ")"                    + // end of a sentence-continuer
        "*?"                   + // zero or more of the preceding sentence-continuers
                                 //   but as few as possible
        keyword                + // the keyword being sought
        "([^.]|\.(?! +[A-Z]))" + // a sentence-continuer, as described above
        "*?"                   + // zero or more of them but as few as possible
        "\."                   + // a sentence-ender, i.e. a period
        "(?= +[A-Z])"          + // followed by one or more spaces and an
                                 //   uppercase letter, which is not remembered
      ")";                       // finish remembering the match
    // That ends up being the following:
    // "\. +([A-Z]([^.]|.(?! +[A-Z]))*?" + keyword + "([^.]|.(?! +[A-Z]))*?\.(?= +[A-Z]))"
    var re = new RegExp(reStr, "g"); // construct the regular expression
    var sentencesWithKeyword = []; // initialize an array to keep the hits
    var arr; // prepare an array to temporarily keep 'exec' return values
    var expandedText = ". " + text + " A";
    // add a sentence-ender (i.e. a period) before the text
    //   and a sentence-starter (i.e. an uppercase letter) after the text
    //   to facilitate finding the first and last sentences
    while ((arr = re.exec(expandedText)) !== null) { // while hits are found
      sentencesWithKeyword.push(arr[1]); // remember the sentence found
      re.lastIndex -= 2; // start the next search two characters back
                         //   to allow for starting the next match
                         //   with the period that ended the current match
    // show the results
    show("Text to search:");
    show("Query string: " + keyword);
    for (var num = 0; num < sentencesWithKeyword.length; num += 1) {
      show((num + 1) + ". " + sentencesWithKeyword[num]);
    function show(msg) {
      document.write("<p>" + msg + "</p>");



    • 将原始文本拆分为句子元素数组
    • 在每个句子中搜索关键字
    • 让那些拥有关键字,丢弃那些没有
    • 的关键字


    var textToSearch = "I like cats. I really like cats. I also like dogs. Cats are great.  Catsup is tasty. Dogs and cats are pets. Approx. half of pets are cats. Approx. half of pets are dogs. Some cats are v. expensive.";
    var keyword = "cats";
    var sentences = {
      all           : [],
      withKeyword   : [],
      withNoKeyword : []
    var sentenceRegex = new RegExp("([.]) +([A-Z])", "g");
    var sentenceSeparator = "__SENTENCE SEPARATOR__";
    var modifiedText = textToSearch.replace(sentenceRegex, "$1" + sentenceSeparator + "$2");
    sentences.all = modifiedText.split(sentenceSeparator);
    sentences.all.forEach(function(sentence) {
      var keywordRegex = new RegExp("(^| +)" + keyword + "( +|[.])", "i");
      var keywordFound = keywordRegex.test(sentence);
      if (keywordFound) {
      } else {
    document.write("<pre>" + JSON.stringify(sentences, null, 2) + "</pre>");