Question

我正在尝试使用Perl从PDF文件中提取文本/图像/表格。

我尝试使用CAM::PDF，它不是在文本中提取，而是以其他格式提取。

是否有使用Perl模块从PDF中提取文本/图像/表格的方法？

Answer 1

@ priya..i尝试了这个模块，它适用于PDF文本提取..

use strict;
use warnings;
use PDF::OCR::Thorough;


my $filename = "pdf.pdf";

my $pdf = PDF::OCR::Thorough->new($filename);
my $text = $pdf->get_text();
print "$text";

Answer 2

使用CAM::PDF。它有一些方法可以帮助您提取图像或其他元素：

$doc->getProperty($pagenum, $propertyname)
Each PDF page contains a list of resources that it uses (images, fonts, etc). getPropertyNames() returns an array of the names of those resources. getProperty() returns a node representing a named property (most likely a reference node).

使用Perl从PDF中提取图像/文本

2 个答案: