我使用以下代码段通过iTextSharp v5.5.13开源软件包将PDF文档中的页面转换为图像(.jpg)。
List<string> pdfImage = new List<string>();
string fileName = Path.GetFileNameWithoutExtension(pdfFullPath);
var pdf = new PdfReader(pdfFullPath);
int n = pdf.NumberOfPages;
for (int j = 1; j <= n; j++)
{
var pg = pdf.GetPageN(j);
var res = PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES)) as PdfDictionary;
var xobj = PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT)) as PdfDictionary;
if (xobj == null) continue;
var keys = xobj.Keys;
if (keys.Count == 0) continue;
var obj = xobj.Get(keys.ElementAt(0));
if (!obj.IsIndirect()) continue;
var tg = PdfReader.GetPdfObject(obj) as PdfDictionary;
var type = PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE)) as PdfName;
if (!PdfName.IMAGE.Equals(type)) continue;
int XrefIndex = (obj as PRIndirectReference).Number;
var pdfStream = pdf.GetPdfObject(XrefIndex) as PRStream;
var data = PdfReader.GetStreamBytesRaw(pdfStream);
var jpeg = Path.Combine(pdfImgPath, string.Format(fileName + j + ".jpg"));
System.IO.File.WriteAllBytes(jpeg, data);
pdfImage.Add(Path.GetFileName(jpeg));
}
这段代码仅获取PDF文档每一页中的图像,而不是整个页面中的图像。我希望输出应为“ n”个具有pdf文档页面中所有内容的图像,其中n->否。 pdf文档中的页面数。谁能帮助我找出我应该在代码中进行哪些修改才能获得所需的输出?