如何从示例中获取页码(使用PHP)

时间:2014-10-22 07:05:54

标签: php regex

我有不同版本的文件名。

如何从中获取123.pdf124.pdf125.pdf? 文件名的长度可能会有所不同,14-5678与此时间无关,应予以忽略。

  • 14-5678_jobname_0123_.p1.PDF
  • 14-5678_jobname_0123_.p2.PDF
  • 14-5678_jobname_0125_.p1.PDF
  • Weired_filename_0123_bla_14-5678_jobname.p1.PDF
  • Weired_filename_0123_bla_14-5678_jobname.p2.PDF
  • Weired_filename_0125_bla_14-5678_jobname.p1.PDF
  • 14-5678_jobname_0123.p1.PDF
  • 14-5678_jobname_0123.p2.PDF
  • 14-5678_jobname_0125.p1.PDF
  • 0123_14-5678_jobname.p1.PDF
  • 0123_14-5678_jobname.p2.PDF
  • 0125_14-5678_jobname.p1.PDF
  • jobname_0123_14-5678.p1.PDF
  • jobname_0123_14-5678.p2.PDF
  • jobname_0125_14-5678.p1.PDF

与正则表达式测试人员一起试了几个小时,我现在完全卡住了。会喜欢一些可以完成这项工作的PHP代码。

1 个答案:

答案 0 :(得分:0)

您需要匹配一系列前面没有破折号的四个数字:

/[^-](\d{4})/

分解正则表达式:

  • [^-]:不是破折号
  • \d{4}:四位数
  • (\d{4}):捕获数字

然后,您可以添加.pdf来获取文件名。

preg_replace示例以及您在数组中给出的文件名:

foreach ($files as $f) {
    echo "$f => " . preg_replace("/.*?[^-]*(\d{4}).+/", "$1.pdf", $f) . PHP_EOL;
}

ETA:如果您想要考虑页码,可以使用以下代码:

foreach ($files as $f) {
    # this saves the four digits of the PDF name, and the number in p1/p2
    preg_match("/.*?[^-]*(\d{4}).*?p(\d+)\.pdf/i", $f, $matches);
    # if the number (from p1/p2) is greater than 1, add it to the PDF name number
    if ($matches[2] > 1) {
        $matches[1] += $matches[2] - 1;
    }
    # format the pdf name to be four digits long, with zero padding for shorter names
    echo "$f => " . sprintf('%04d.pdf',  $matches[1]) . PHP_EOL;
}

输出:

14-5678_jobname_0123_.p1.PDF => 0123.pdf
14-5678_jobname_0123_.p2.PDF => 0124.pdf
14-5678_jobname_0125_.p1.PDF => 0125.pdf
Weired_filename_0123_bla_14-5678_jobname.p1.PDF => 0123.pdf
Weired_filename_0123_bla_14-5678_jobname.p2.PDF => 0124.pdf
Weired_filename_0125_bla_14-5678_jobname.p1.PDF => 0125.pdf
14-5678_jobname_0123.p1.PDF => 0123.pdf
14-5678_jobname_0123.p2.PDF => 0124.pdf
14-5678_jobname_0125.p1.PDF => 0125.pdf
0123_14-5678_jobname.p1.PDF => 0123.pdf
0123_14-5678_jobname.p2.PDF => 0124.pdf
0125_14-5678_jobname.p1.PDF => 0125.pdf
jobname_0123_14-5678.p1.PDF => 0123.pdf
jobname_0123_14-5678.p2.PDF => 0124.pdf
jobname_0125_14-5678.p1.PDF => 0125.pdf