在日期上打破一个字符串

时间:2015-01-05 21:23:59

标签: php regex

我正在使用以下脚本我已将其更改为将大字符串拆分为已发送的内容。但是,我遇到问题也要在日期上打破。

原始工作代码:

$re = '/# Split sentences on whitespace between them.
(?<=                # Begin positive lookbehind.
  [.!?:]             # Either an end of sentence punct,
| [.!?:][\'"]
| [\r\t\n]              # or end of sentence punct and quote.
)                   # End positive lookbehind.
(?<!                # Begin negative lookbehind.
  Mr\.              # Skip either "Mr."
| Mrs\.             # or "Mrs.",    
| Ms\.              # or "Ms.",
| Jr\.              # or "Jr.",
| Dr\.              # or "Dr.",
| Prof\.            # or "Prof.",
| U\.S\.A\.
| Sr\.              # or "Sr.",
| T\.V\.A\.         # or "T.V.A.",
| a\.m\.            # or "a.m.",
| p\.m\.            # or "p.m.",
| •\.
| :\.
| •\.

                    # or... (you get the idea).
)                   # End negative lookbehind.
\s+                 # Split on whitespace between sentences.

/ix';

$sentences = preg_split($re, $block_o_text, -1, PREG_SPLIT_NO_EMPTY);
for ($i = 0; $i < count($sentences); ++$i) {

我添加了[0-9] / [0-9] / [0-9],但它似乎没有达到预期的效果。我错过了什么?这是我的更新代码:

$re = '/# Split sentences on whitespace between them.
(?<=                # Begin positive lookbehind.
  [.!?:]             # Either an end of sentence punct,
| [.!?:][\'"]
| [\r\t\n]          # or end of sentence punct and quote.
| [0-9]/[0-9]/[0-9] # or on a date
)                   # End positive lookbehind.
(?<!                # Begin negative lookbehind.
  Mr\.              # Skip either "Mr."
| Mrs\.             # or "Mrs.",    
| Ms\.              # or "Ms.",
| Jr\.              # or "Jr.",
| Dr\.              # or "Dr.",
| Prof\.            # or "Prof.",
| U\.S\.A\.
| Sr\.              # or "Sr.",
| T\.V\.A\.         # or "T.V.A.",
| a\.m\.            # or "a.m.",
| p\.m\.            # or "p.m.",
| •\.
| :\.
| •\.

                    # or... (you get the idea).
)                   # End negative lookbehind.
\s+                 # Split on whitespace between sentences.

/ix';

1 个答案:

答案 0 :(得分:1)

日期不仅有一位数,尤其是年份。你需要考虑到这一点。您还需要转义/,因为这是您的正则表达式分隔符。

[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2,4}