Question

我需要将pdf转换为字节数组，反之亦然。

任何人都可以帮助我吗？

这就是我转换为字节数组的方式

public static byte[] convertDocToByteArray(String sourcePath) {

    byte[] byteArray=null;
    try {
        InputStream inputStream = new FileInputStream(sourcePath);


        String inputStreamToString = inputStream.toString();
        byteArray = inputStreamToString.getBytes();

        inputStream.close();
    } catch (FileNotFoundException e) {
        System.out.println("File Not found"+e);
    } catch (IOException e) {
                System.out.println("IO Ex"+e);
    }
    return byteArray;
}

如果我使用以下代码将其转换回文档，则会创建pdf。但它是在说'Bad Format. Not a pdf'。

public static void convertByteArrayToDoc(byte[] b) {          

    OutputStream out;
    try {       
        out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
        out.close();
        System.out.println("write success");
    }catch (Exception e) {
        System.out.println(e);
    }

Answer 1

你基本上需要一个帮助方法来将流读入内存。这非常有效：

public static byte[] readFully(InputStream stream) throws IOException
{
    byte[] buffer = new byte[8192];
    ByteArrayOutputStream baos = new ByteArrayOutputStream();

    int bytesRead;
    while ((bytesRead = stream.read(buffer)) != -1)
    {
        baos.write(buffer, 0, bytesRead);
    }
    return baos.toByteArray();
}

然后你打电话给：

public static byte[] loadFile(String sourcePath) throws IOException
{
    InputStream inputStream = null;
    try 
    {
        inputStream = new FileInputStream(sourcePath);
        return readFully(inputStream);
    } 
    finally
    {
        if (inputStream != null)
        {
            inputStream.close();
        }
    }
}

不要混淆文本和二进制数据 - 它只会导致眼泪。

Answer 2

Java 7引入了Files.readAllBytes()，它可以将PDF读入byte[]，如下所示：

import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;

Path pdfPath = Paths.get("/path/to/file.pdf");
byte[] pdf = Files.readAllBytes(pdfPath);

编辑：

感谢Farooque指出：这将适用于阅读任何类型的文件，而不仅仅是PDF。所有文件最终只是一堆字节，因此可以读入byte[]。

Answer 3

问题是您在toString()对象本身上调用了InputStream。这将返回String对象的InputStream表示，而不是实际的PDF文档。

您只想将PDF作为字节读取为PDF格式的二进制格式。然后，您将能够写出相同的byte数组，并且它将是一个有效的PDF，因为它尚未被修改。

e.g。将文件读为字节

File file = new File(sourcePath);
InputStream inputStream = new FileInputStream(file); 
byte[] bytes = new byte[file.length()];
inputStream.read(bytes);

Answer 4

您可以使用Apache Commons IO来完成，而无需担心内部细节。

使用返回org.apache.commons.io.FileUtils.readFileToByteArray(File file)类型数据的byte[]。

Click here for Javadoc

Answer 5

在toString()上拨打InputStream并不能达到您的预期效果。即使它确实如此，PDF也包含二进制数据，因此您不希望首先将其转换为字符串。

您需要做的是从流中读取，将结果写入ByteArrayOutputStream，然后通过调用ByteArrayOutputStream将byte转换为实际的toByteArray()数组：

InputStream inputStream = new FileInputStream(sourcePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

int data;
while( (data = inputStream.read()) >= 0 ) {
    outputStream.write(data);
}

inputStream.close();
return outputStream.toByteArray();

Answer 6

你是不是在创建pdf文件而不是实际写回字节数组？因此您无法打开PDF。

out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.Write(b, 0, b.Length);
out.Position = 0;
out.Close();

这是正确读取PDF到字节数组的补充。

Answer 7

InputStream is = getResources().openRawResource(+ R.drawable.icon);

Answer 8

将pdf转换为byteArray ：

public byte[] pdfToByte(String filePath)throws JRException {

         File file = new File(<filePath>);
         FileInputStream fileInputStream;
         byte[] data = null;
         byte[] finalData = null;
         ByteArrayOutputStream byteArrayOutputStream = null;

         try {
            fileInputStream = new FileInputStream(file);
            data = new byte[(int)file.length()];
            finalData = new byte[(int)file.length()];
            byteArrayOutputStream = new ByteArrayOutputStream();

            fileInputStream.read(data);
            byteArrayOutputStream.write(data);
            finalData = byteArrayOutputStream.toByteArray();

            fileInputStream.close(); 

        } catch (FileNotFoundException e) {
            LOGGER.info("File not found" + e);
        } catch (IOException e) {
            LOGGER.info("IO exception" + e);
        }

        return finalData;

    }

Answer 9

这对我有用：

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
    byte[] buffer = new byte[1024];
    int bytesRead;
    while((bytesRead = pdfin.read(buffer))!=-1){
        pdfout.write(buffer,0,bytesRead);
    }
}

但如果以下列方式使用，Jon的答案对我不起作用：

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){

    int k = readFully(pdfin).length;
    System.out.println(k);
}

输出零作为长度。这是为什么？

Answer 10

这些都不适合我们，可能是因为我们的inputstream来自休息电话byte，而不是来自本地托管的pdf文件。有用的是使用RestAssured将PDF作为输入流读取，然后使用Tika pdf reader解析它，然后调用toString()方法。

import com.jayway.restassured.RestAssured;
import com.jayway.restassured.response.Response;
import com.jayway.restassured.response.ResponseBody;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.Parser;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;

            InputStream stream = response.asInputStream();
            Parser parser = new AutoDetectParser(); // Should auto-detect!
            ContentHandler handler = new BodyContentHandler();
            Metadata metadata = new Metadata();
            ParseContext context = new ParseContext();

            try {
                parser.parse(stream, handler, metadata, context);
            } finally {
                stream.close();
            }
            for (int i = 0; i < metadata.names().length; i++) {
                String item = metadata.names()[i];
                System.out.println(item + " -- " + metadata.get(item));
            }

            System.out.println("!!Printing pdf content: \n" +handler.toString());
            System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));

Answer 11

我已经在我的应用程序中实现了类似的行为。下面是我的代码版本，它正常运行。

    byte[] getFileInBytes(String filename) {
    File file  = new File(filename);
    int length = (int)file.length();
    byte[] bytes = new byte[length];
    try {
        BufferedInputStream reader = new BufferedInputStream(new 
    FileInputStream(file));
    reader.read(bytes, 0, length);
    System.out.println(reader);
    // setFile(bytes);

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return bytes;
    }

Answer 12

这对我有用。我没有使用任何第三方库。只是Java附带的。

import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class PDFUtility {

public static void main(String[] args) throws IOException {
    /**
     * Converts byte stream into PDF.
     */
    PDFUtility pdfUtility = new PDFUtility();
    byte[] byteStreamPDF = pdfUtility.convertPDFtoByteStream();
    FileOutputStream fileOutputStream = new FileOutputStream("C:\\Users\\aseem\\Desktop\\BlaFolder\\BlaFolder2\\aseempdf.pdf");
    fileOutputStream.write(byteStreamPDF);
    fileOutputStream.close();
    System.out.println("File written successfully");
}

/**
 * Creates PDF to Byte Stream
 *
 * @return
 * @throws IOException
 */
protected byte[] convertPDFtoByteStream() throws IOException {
    Path path = Paths.get("C:\\Users\\aseem\\aaa.pdf");
    return Files.readAllBytes(path);
}

}

Answer 13

PDF可能包含二进制数据，当你执行ToString时，它可能会被破坏。在我看来，你想要这个：

        FileInputStream inputStream = new FileInputStream(sourcePath);

        int numberBytes = inputStream .available();
        byte bytearray[] = new byte[numberBytes];

        inputStream .read(bytearray);

PDF到字节数组，反之亦然

13 个答案: