Question

我知道这个正则表达式将文本分成句子。有人可以帮我理解吗？

/(?<!\..)([\?\!\.])\s(?!.\.)/

Answer 1

您可以使用YAPE::Regex::Explain来解密Perl正则表达式：

use strict;
use warnings;
use YAPE::Regex::Explain;

my $re = qr/(?<!\..)([\?\!\.])\s(?!.\.)/;
print YAPE::Regex::Explain->new($re)->explain();

__END__

The regular expression:

(?-imsx:(?<!\..)([\?\!\.])\s(?!.\.))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )                        end of look-behind
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [\?\!\.]                 any character of: '\?', '\!', '\.'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Answer 2

Regular Expression Analyzer与toolic已经建议完全相同 - 但完全基于网络。

Answer 3

(?         # Find a group (don't capture)
<          # before the following regular expression
!          # that does not match
\.         # a literal "."
.          # followed by 1 character
)          # (End look-behind group)
(          # Start a group (capture it to $1)
[\?\!\.]   # Containing any one of the characters in the following set "?!."
)          # End group $1
\s         # followed by a whitespace character " ", \t, etc.
(?         # Followed by a group (don't capture)
           # after the preceding regular expression
!          # that does not have
.          # 1 character
\.         # followed by a literal "."
)          # (End look-ahead group)

Answer 4

第一部分(?<!\..)是一个负面的后卫。它指定使匹配无效的模式。在这种情况下，它正在寻找两个字符 - 第一个是一个时期，另一个是任何一个字符。

第二部分是一个标准的捕获/组，可以更好地表达：([?!.])（你不需要在类括号中的转义），这是一个句子结束标点符号。

下一部分是单（ ?? ）空白字符：\s

最后一部分是否定前瞻：(?!.\.)。它再次防止单个字符后跟一段时间。

这应该有效，相对。但我不认为我会推荐它。我没有看到编码员试图确保只是一段时间不是第二个最近的角色，或者它不是第二个角色。

我的意思是，如果你想分析终端标点符号，为什么你不想防止同一个班级是双后卫还是两个前进？相反，它依赖于不存在的时期。因此，更正规的表达方式是：

/(?<![?!.].)([?!.])\s(?!.[?!.])/

Answer 5

部分：

([\?\!\.])\s：以结尾字符（.，!或?）分隔，后跟空白字符（空格，制表符，换行符）
(?<!\..)此“结束字符”前的字符不是. +任何
(?!.\.)在空格字符后面不允许任何字符直接跟随任何.。

那些预见（(?!）＆amp; look-behind（(?<!）断言主要似乎是为了防止拆分（whitespaced？）缩写（q. e. d.等）。

这个正则表达式如何将文本分成句子？

5 个答案: