Question

//START GET DATES
$regexp = '/[0-9]{2,4}[-\/ ]{1}([A-Za-z]{3}|[0-9]{2})[-\/ ]{1}[0-9]{2,4}/i';

preg_match_all($regexp, $output, $dates);

//Dec 05, 1995 + December 5, 1995
$regexp = '/\b[[A-Za-z]{3,9}\b[ 0-9\,]{2,5}[0-9]{4}/i';
preg_match_all($regexp, $output, $dates);

//09 Aug 2012
$regexp = '/[0-9]{2}[ ]{1}[A-Za-z]{3}[ ]{1}[0-9]{4}/i';
preg_match_all($regexp, $output, $dates);
print_r($dates);

以上是我的正则表达式，用于从一丛文本中提取不同格式的日期..

表达完美无缺，据我所知，绝对没有任何改变。

任何人都可以告诉我表达式是否有任何问题，如果没有，还有什么可能导致这种突然的突破？

干杯

Answer 1

如果没有更多信息，很难给出准确的答案，但有些事情会浮现在脑海中：

这些是一些草率的正则表达式。
- [A-Za-z]，然后是不区分大小写的选项。
- [[A-Za-z]。
- {1}（反复）。
- 不必要的逃脱，等等。如果他们也有错误，我也不会感到惊讶。
您正在按顺序应用正则表达式。我不知道PHP，但看起来之前的匹配结果会被下一个preg_match_all覆盖。也许你确实有结果，但它们会被下一个没有任何匹配的正则表达式覆盖？

所以让我们试着为你找到一个更好的正则表达式，一个单一的正则表达式。怎么样：

preg_match_all(
    '%\b                  # Start at a word boundary
    (?:                   # Match the following:
     (?:                  # either
      \d+\b               # a number,
      (?:\.|st|nd|rd|th)* # followed by a dot, st, nd, rd, or th (optional)
      |                   # or a month name
      (?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*)\b
     )
     [\s.,/-]*            # followed by a date separator, comma or whitespace (opt.)
    ){3}                  # Do this three times
    (?<!\s)               # Don\'t match trailing whitespace
    %ix', 
    $output, $dates, PREG_PATTERN_ORDER);
$dates = $dates[0];

PHP正则表达式虽然没有改变，

1 个答案: