答案 0 :(得分:1)
@ priya..i尝试了这个模块,它适用于PDF文本提取..
use strict;
use warnings;
use PDF::OCR::Thorough;
my $filename = "pdf.pdf";
my $pdf = PDF::OCR::Thorough->new($filename);
my $text = $pdf->get_text();
print "$text";
答案 1 :(得分:0)
使用CAM::PDF。它有一些方法可以帮助您提取图像或其他元素:
$doc->getProperty($pagenum, $propertyname)
Each PDF page contains a list of resources that it uses (images, fonts, etc). getPropertyNames() returns an array of the names of those resources. getProperty() returns a node representing a named property (most likely a reference node).