我设法通过使用PdfSharp从PDF版本1.2中提取文本作为参考link
我提取文字的代码
private string ExtractText(CObject cObject, ref string pdfcontentstr)
{
if (cObject is COperator)
{
var cOperator = cObject as COperator;
if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() ||
cOperator.OpCode.Name == OpCodeName.TJ.ToString())
{
foreach (var cOperand in cOperator.Operands)
{
ExtractText(cOperand, ref pdfcontentstr);
}
}
}
else if (cObject is CSequence)
{
var cSequence = cObject as CSequence;
foreach (var element in cSequence)
{
ExtractText(element, ref pdfcontentstr);
}
}
else if (cObject is CString)
{
var cString = cObject as CString;
pdfcontentstr = pdfcontentstr + ";" + cString.Value;
}
return pdfcontentstr;
}
但是当我尝试提取PDF版本1.3(具有相同内容)时,程序返回不可读的内容,例如:
0%0O0R0F0N00%0
PDF文件中的实际内容: B座
任何人都可以提供帮助?提前谢谢。