Question

我正在尝试使用文件流从pdf文件中读取，我想将其写入cp1252编码格式的编写器。以下是代码：

byte buf[] = new byte[8192];
InputStream is = new FileInputStream(f); 
ByteArrayOutputStream oos = new ByteArrayOutputStream(); 
int c=0; 
while ((c = is.read(buf)) != -1) { 
   oos.write(buf, 0, c); 
}
byte out[] = oos.toByteArray();
String str = oos.toString(out,"UTF-8");
char[] ch = str.toCharArray();
writer.write(ch);
is.close(); 
oos.close();

但是输出是错误的，因为文本不可读（未正确转换）。我该如何解决这个问题？

Answer 1

尝试从PDF文件中读取时可能遇到错误。尝试使用PDFBox从PDF文件中提取文本。这可能是最好的方法之一。获得所需文本后，可以使用cp1252编码进行保存。

您可以使用here

中的PDFBox查看文本提取示例

关于转换为cp1252，如果您使用的是Windows机器，则默认编码为cp1252。因此，只需尝试保存文本希望将其保存为cp1252编码。

从UTF-8格式的pdf文件读取并以cp1252格式将其写入编写器

1 个答案: