我一直在使用以下C#代码从PDF文件中读取文本:
PdfReader reader = new PdfReader(openFileDialog1.FileName);
int n = reader.NumberOfPages;
// file properties
Dictionary<string, string> infodict = reader.Info;
string strText = string.Empty;
PdfReader reader2 = new PdfReader(openFileDialog1.FileName);
for (int page = 1; page <= n; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
reader.Close();
}
MessageBox.Show(strText);
此代码无法读取pdf文件中的符号。有什么办法我还可以从PDF文件中读取符号吗?
答案 0 :(得分:0)
试试这个,使用LocationTextExtractionStrategy而不是SimpleTextExtractionStrategy