Question

我有一个pdf文件，我使用＆＃34; pdfparser＆＃34;插入。从页面文本我需要找到特定字符串（搜索字符串）后的第一个日期。我可以找到搜索字符串和我使用的日期提取

date_parse($string)

它提取日期，月份罚款，但我认为由于大字符串（有更多的日期，数字）它没有填充正确的年份，它给出了一个随机数，甚至不存在于doc中。

有没有其他方法可以获得该日期。以下是示例字符串。（它在日期之后有更多日期）

Satisfaction of the mortgage from Karen Ann Lewis,a single woman to Bank of America, N.A. recorded March 4, 2004

Answer 1

您可以尝试以下RegEx（以您提供的格式提取第一个日期），然后使用parse_date()：

$str = 'Satisfaction of the mortgage from Karen Ann Lewis,a single woman to Bank of America, N.A. recorded March 4, 2004';

preg_match("/(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}/", $str, $matches);

var_dump( date_parse($matches[0]) );

以上输出：

array(12) {
  ["year"]     => int(2004)
  ["month"]    => int(3)
  ["day"]      => int(4)
  ["hour"]     => bool(false)
  ["minute"]   => bool(false)
  ["second"]   => bool(false)
  ["fraction"] => bool(false)
  ...
}

Demo here

在字符串中查找第一个日期

1 个答案: