从PDF中提取文本时,指数不会保持内联。我该如何解决这个问题?
string text = string.Empty;
using (PdfReader reader = new PdfReader(fileLocation))
{
ITextExtractionStrategy strategy;
RenderFilter[] filter = new RenderFilter[1];
for (int page = 2; page < reader.NumberOfPages; page++)
{
RectangleJ mediaBox = reader.GetPageSize(page);
filter[0] = new RegionTextRenderFilter(new RectangleJ(mediaBox.Left, mediaBox.Bottom+60, mediaBox.Right, mediaBox.Top-140));
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
text += PdfTextExtractor.GetTextFromPage(reader, page, strategy) + "\n\n";
}
}
如果PDF中的文字行是:
提取文本后的结果是:
-4 3
2.9 x 10 m
但它应该是2.9 ^ -4 x10 ^ 3