Unable to read some parts of the PDF document using iText

时间:2017-11-08 21:59:41

标签: itext itext7

I have a bunch of PDF documents and I am generally able to read all of the documents using the method iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage

Some of the documents have a block of text which is not being read. E.g. In the picture attached, I am unable to read text in the region encircled with yellow.

I guess, that this is entity is not a picture because I am unable to copy paste using the mouse. Also, I am able to read images in the document by handling EventType.RENDER_IMAGE in a custom strategy object. And, the encircled region does not get extracted as an image.

Any suggestions on how this could be read?

Thanks, Sau enter image description here

1 个答案:

答案 0 :(得分:0)

如果您没有同时获得该内容的RENDER_TEXTRENDER_IMAGE事件,则很可能使用矢量图形说明进行绘制。

你也可以检索这些指令,但你得到的是一系列路径定义(移动到,行到,曲线到......)和路径渲染(描边,填充......)信息为RENDER_PATH个事件。