我必须处理来自学生论文的文本(文本可能非常大)。
我需要在PHP中使用preg_match来表示字符串中可能以这种方式出现的日期:
...blah blah blah (1994) blah blah blah ...
...blah blah blah (nov-1994) blah blah blah ...
...blah blah blah (november-1994) blah blah blah ...
...blah blah blah (1994-nov) blah blah blah ...
...blah blah blah (1994-november) blah blah blah ...
字符串中的日期可能带有'()'或者使用' []'
我这样做了:
if (preg_match('/\w{0,8}-?(19|20)\d{2}-?\w{0,8}/', $string, $s)) {
# code
}
这是正确的,并且可以捕捉一些不相关的字符串,例如
... blah blah blah (SKU_1956) blah blah blah ...
... blah blah blah [INFERNO2000] blah blah blah ...
... blah blah blah [like-2000-me] blah blah blah ...
我似乎无法做到这一点,所以我需要帮助来微调这个正则表达式以仅捕获
这个词限于8个字符,因为月份最长(如12月)
捕获了大量不相关的字符串,这就是为什么我要对其进行微调。
答案 0 :(得分:1)
您可以使用RegEx [(\[](([a-zA-Z]{1,8}-)?(19|20)\d{2}|(19|20)\d{2}-[a-zA-Z]{1,8})[)\]]
[(\[] ... [)\]]
匹配()
或[]
([a-zA-Z]{1,8}-)?(19|20)\d{2}
匹配month-YEAR
,月份为可选
([a-zA-Z]{1,8}-)?
匹配1
和8
次之间的字母字符,以及-
(19|20)\d{2}
匹配19..
或20..
(19|20)\d{2}-[a-zA-Z]{1,8})
匹配YEAR-month
答案 1 :(得分:0)
您可以列出数组中的所有有效date formats:
$formats = ["M-Y", "Y", "F-Y", "Y-F", "Y-M"];
然后循环测试是否可以创建有效的DateTime:
作为正则表达式模式,您可以捕获组1中括号内的内容:
$strings = [
"...blah blah blah (1994) blah blah blah ... ",
"...blah blah blah (nov-1994) blah blah blah ... ",
"...blah blah blah (november-1994) blah blah blah ...",
"...blah blah blah (1994-nov) blah blah blah ...",
"...blah blah blah (1994-november), (1994), (nov-1994) blah blah blah ...",
"...blah blah blah (1994-november) blah blah blah ..."
];
$formats = ["M-Y", "Y", "F-Y", "Y-F", "Y-M"];
$pattern = '/\(([^)]+)\)/';
foreach ($strings as $string) {
preg_match_all($pattern, $string, $matches);
foreach ($matches[1] as $match) {
foreach ($formats as $format) {
if (DateTime::createFromFormat($format, $match) !== false) {
echo "$string contains valid date: <b>$match</b>" . PHP_EOL;
break;
}
}
}
}