Question

我使用CAM :: PDF中的getpdftext.pl来提取pdf并将其打印到文本，但在我的Web应用程序中，我想在.cgi脚本中调用此getpdftext.pl。你可以建议我做什么或如何继续前进。我尝试将getpdftext.pl转换为getpdftext.cgi，但它不起作用。

全部谢谢

这是我的request_admin.cgi脚本

的摘录

my $filename  = $q->param('quote');
:
:
:
&parsePdf($filename);

#function to extract text from pdf ,save it in a text file and parse the required fields
sub parsePdf($)
{
    my $i;
    print $_[0];
    $filein = "quote_uploads/$_[0]";
    $fileout = 'output.txt';

    print "inside parsePdf\n";

    open OUT, ">$fileout" or die "error: $!";

    open IN, '-|', "getpdftext.pl $filein" or die "error :$!" ;

    while(<IN>)
    {
        print "$i";
        $i++;
        print OUT;
    }

}

Answer 1

很有可能

您的CGI脚本环境不够完整，无法找到 getpdftext.pl和/或
无论如何，网络服务器用户无权执行

查看您的网络服务器的错误日志，看看它是否报告了为什么这不起作用的任何指示。

Answer 2

在您的特定情况下，直接使用CAM::PDF可能更简单，更直接，无论如何应该与getpdftext.pl一起安装。

我查看了这个脚本，我认为你的parsePdf sub可以很容易地写成：

#!/usr/bin/perl
use warnings;
use strict;

use CAM::PDF;

sub parsePdf {
    my $filein = "quote_uploads/$_[0]";
    my $fileout = 'output.txt';

    open my $out_fh, ">$fileout" or die "error: $!";

    my $doc = CAM::PDF->new($filein) || die "$CAM::PDF::errstr\n";
    my $i = 0;

    foreach my $p ($doc->rangeToArray(1,$doc->numPages()))
    {
        my $str = $doc->getPageText($p);
        if (defined $str)
        {
            CAM::PDF->asciify(\$str);
            print $i++;
            print $out_fh $str;
        }
    }
}

如何在.cgi脚本中调用.pl文件

2 个答案: