Question

我需要一种方法来计算PHP中PDF的页数。我已经做了一些谷歌搜索，我发现的唯一的东西要么使用shell / bash脚本，perl或其他语言，但我需要原生PHP的东西。有没有图书馆或如何做到这一点的例子？

Answer 1

如果使用Linux，这比使用identify获取页数要快得多（特别是页面数量很多）：

exec('/usr/bin/pdfinfo '.$tmpfname.' | awk \'/Pages/ {print $2}\'', $output);

您确实需要安装pdfinfo。

Answer 2

您可以使用PHP的ImageMagick扩展。 ImageMagick了解PDF，您可以使用identify命令提取页数。 PHP函数是Imagick::identifyImage()。

Answer 3

我知道这已经很久了......但如果它现在与我相关，那么它也可能与其他人相关。

我刚刚制定了获取页码的方法，因为这里列出的方法效率低，而且对于大型PDF来说速度极慢。

$im = new Imagick();
$im->pingImage('name_of_pdf_file.pdf');
echo $im->getNumberImages();

似乎对我有用！

Answer 4

我实际上采用了综合方法。由于我在我的服务器上禁用了exec，我想坚持使用基于PHP的解决方案，所以最终得到了这个：

代码：

function getNumPagesPdf($filepath){
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "",$filepath),"r");
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    if($max==0){
        $im = new imagick($filepath);
        $max=$im->getNumberImages();
    }

    return $max;
}

如果由于没有Count标签而无法解决问题，那么它会使用imagick php扩展。我采用双重方法的原因是因为后者非常慢。

Answer 5

你可以试试fpdi（见here），你可以看到在设置源文件时你得到了页码。

Answer 6

试试这个：

<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
        echo 'failed opening file '.$_REQUEST['file'];
}
else {
        $max=0;
        while(!feof($fp)) {
                $line = fgets($fp,255);
                if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                        preg_match('/[0-9]+/',$matches[0], $matches2);
                        if ($max<$matches2[0]) $max=$matches2[0];
                }
        }
        fclose($fp);
echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').' in '. $_REQUEST['file'].'.';
}
?>

Count标记显示不同节点中的页数。父节点在其Count标签中具有其他节点的总和，因此该脚本仅查找max（即页面数）。

Answer 7

function getNumPagesPdf($filepath) {
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "", $filepath), "r");
    $max = 0;
    if (!$fp) {
        return "Could not open file: $filepath";
    } else {
        while (!@feof($fp)) {
            $line = @fgets($fp, 255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)) {
                preg_match('/[0-9]+/', $matches[0], $matches2);
                if ($max < $matches2[0]) {
                    $max = trim($matches2[0]);
                    break;
                }
            }
        }
        @fclose($fp);
    }

    return $max;
}

这正是我想要的：

我刚刚制定了获取pdf页码的方法...... 在获得pdf页面计数后，我只是添加break到while，这样它就不会在这里进入无限循环....

Answer 8

这个不使用imagick：

function getNumPagesInPDF($file) 
{
    //http://www.hotscripts.com/forums/php/23533-how-now-get-number-pages-one-document-pdf.html
    if(!file_exists($file))return null;
    if (!$fp = @fopen($file,"r"))return null;
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    return (int)$max;

}

Answer 9

$pdftext = file_get_contents($caminho1);

 $num_pag = preg_match_all("/\/Page\W/", $pdftext,$dummy);

Answer 10

在* nix环境中，您可以使用：

exec('pdftops ' . $filename . ' - | grep showpage | wc -l', $output);

默认情况下应安装pdftops。

或者正如Xethron建议的那样：

pdfinfo filename.pdf | grep Pages: | awk '{print $2}'

Answer 11

仅使用PHP可能导致安装复杂的库，重新启动Apache等。许多纯PHP方式（如打开流和使用正则表达式）不准确。

包含的答案是我能想到的唯一快速可靠的方式。它使用单个可执行文件，但不必安装（* nix或Windows），简单的PHP脚本提取输出。最棒的是我还没有看到错误的页面数量！

可以在这里找到，包括为什么其他方法“不起作用”：

Get the number of pages in a PDF document

仅使用PHP计算PDF中的页数

11 个答案: