正则表达式没有分裂这句话

时间:2015-01-08 22:27:28

标签: php regex

我使用以下代码来分割已发送的内容,但效果很好,但在下面的情况下,它只会引发错误。任何想法为什么它不能得到这个遗产?

 $re = '/# Split sentences on whitespace between them.
     (?<=                # Begin positive lookbehind.
       [.!?:]            # Either an end of sentence punct,
     | [.!?:][\'"]
     | [\r\t\n]         # or end of sentence punct and quote.
     )                   # End positive lookbehind.
     (?<!                # Begin negative lookbehind.
       Mr\.              # Skip either "Mr."
     | Mrs\.             # or "Mrs.",   
     | Ms\.              # or "Ms.",
     | Jr\.              # or "Jr.",
     | Dr\.              # or "Dr.",
     | Prof\.            # or "Prof.",
     | U\.S\.A\.
     | Sr\.              # or "Sr.",
     | T\.V\.A\.         # or "T.V.A.",
     | a\.m\.            # or "a.m.",
     | p\.m\.            # or "p.m.",
     | a€¢\.
     | :\.
     | ?\.

                         # or... (you get the idea).
     )                   # End negative lookbehind.
     \s+                 # Split on whitespace between sentences.

     /ix';

 $english = "Support services, such as help with transportation or clothing, may also be available. How do I receive these services?";

 $english = preg_split($re, $row['english'], -1, PREG_SPLIT_NO_EMPTY);

 print_r($english);

即使符合条件,我也会继续收到此错误:

 PHP Warning:  preg_split(): Compilation failed: nothing to repeat at offset 736 in parse2.php on line 32

2 个答案:

答案 0 :(得分:4)

?是一个特殊的角色,你需要逃脱它:

$re = '/# Split sentences on whitespace between them.
 (?<=                # Begin positive lookbehind.
   [.!?:]            # Either an end of sentence punct,
 | [.!?:][\'"]
 | [\r\t\n]         # or end of sentence punct and quote.
 )                   # End positive lookbehind.
 (?<!                # Begin negative lookbehind.
   Mr\.              # Skip either "Mr."
 | Mrs\.             # or "Mrs.",   
 | Ms\.              # or "Ms.",
 | Jr\.              # or "Jr.",
 | Dr\.              # or "Dr.",
 | Prof\.            # or "Prof.",
 | U\.S\.A\.
 | Sr\.              # or "Sr.",
 | T\.V\.A\.         # or "T.V.A.",
 | a\.m\.            # or "a.m.",
 | p\.m\.            # or "p.m.",
 | a€¢\.
 | :\.
 | \?\.              # <=== over here.

                     # or... (you get the idea).
 )                   # End negative lookbehind.
 \s+                 # Split on whitespace between sentences.

 /ix';

答案 1 :(得分:1)

丹尼尔很好地抓住了 Regeformat 5说有一个量词,但没有量化 既然你已经扩展了,那就无法量化了。如果是文字,则应该进行转义。

     # Split sentences on whitespace between them.
     (?<=                          # Begin positive lookbehind.
          [.!?:]                        # Either an end of sentence punct,
       |  [.!?:] ['"] 
       |  [\r\t\n]                      # or end of sentence punct and quote.
     )                             # End positive lookbehind.
     (?<!                          # Begin negative lookbehind.
          Mr\.                          # Skip either "Mr."
       |  Mrs\.                         # or "Mrs.",   
       |  Ms\.                          # or "Ms.",
       |  Jr\.                          # or "Jr.",
       |  Dr\.                          # or "Dr.",
       |  Prof\.                        # or "Prof.",
       |  U\.S\.A\.
       |  Sr\.                          # or "Sr.",
       |  T\.V\.A\.                     # or "T.V.A.",
       |  a\.m\.                        # or "a.m.",
       |  p\.m\.                        # or "p.m.",
       |  a€¢\.
       |  :\.
       |  
=         ?  <-- Quantifies nothing
          \.

                                        # or... (you get the idea).
     )                             # End negative lookbehind.
     \s+                           # Split on whitespace between sentences.