应用错误收集

Unable to read some parts of the PDF document using iText

时间：2017-11-08 21:59:41

标签： itext itext7

I have a bunch of PDF documents and I am generally able to read all of the documents using the method iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage

Some of the documents have a block of text which is not being read. E.g. In the picture attached, I am unable to read text in the region encircled with yellow.

I guess, that this is entity is not a picture because I am unable to copy paste using the mouse. Also, I am able to read images in the document by handling EventType.RENDER_IMAGE in a custom strategy object. And, the encircled region does not get extracted as an image.

Any suggestions on how this could be read?

Thanks, Sau

1 个答案:

答案 0 :(得分：0)

如果您没有同时获得该内容的RENDER_TEXT或RENDER_IMAGE事件，则很可能使用矢量图形说明进行绘制。

你也可以检索这些指令，但你得到的是一系列路径定义（移动到，行到，曲线到......）和路径渲染（描边，填充......）信息为RENDER_PATH个事件。