将长字符串写入HTML文件,InputStream vs FileWriter vs BufferedReader

时间:2015-04-03 23:07:40

标签: java inputstream nio bufferedreader filewriter

我对最好的方法感到困惑。我在SO上看到了很多例子,很多答案都有不同的解决方案。所以我想知道将一个非常长的字符串写入一个新的html文件的最有效方法(即从一个字符串中创建一个html文件)。是否真的优先将everthing包装到缓冲区?像:

    fileWriter = new FileWriter(new File(dir, appBook.getPath()));
    bufferWritter = new BufferedWriter(fileWriter);
    bufferWritter.append(htmlContent);

或者我可以改为做(不失性能)

    fileWriter = new FileWriter(new File(dir, appBook.getPath()));
    fileWriter .append(htmlContent);

...

这是我已经使用了一段时间的方法:

//Will run out of memory if i dont split the string in 650000 chunks
    String[] bookPieces = splitString(htmlContent, Math.round(htmlContent.length()/650000));
    OutputStream outputStream = null;
    InputStream inputStream = null;

    try {
        outputStream = new FileOutputStream(new File(dir, appBook.getPath())); //.html path
        for (String text : bookPieces) {
            byte[] theBytes = text.getBytes(Charset.forName("UTF-16"));
            inputStream = new ByteArrayInputStream(theBytes);
            byte[] bufferData = new byte[1024];
            int bytesRead = inputStream.read(bufferData);

            while (bytesRead != -1) {
                outputStream.write(bufferData, 0, bytesRead); //add the bufferData data to the "new file"
                bytesRead = inputStream.read(bufferData); // keep on reading and filling the dynamic byte araay until it returns -1
            }
            //need to GC the inputsteam myself!!!!
            inputStream = null;

        }
        toReturn = true;

    } 

比我读到的那样,优先使用BufferedReader来处理长文本字符串。所以我改为:

    String[] bookPieces = splitString(htmlContent, Math.round(htmlContent.length()/650000));
    OutputStream outputStream = null;
    InputStream inputStream = null;

    OutputStreamWriter oo;

    try {
        outputStream = new FileOutputStream(new File(dir, appBook.getPath()));
        for (String text : bookPieces) {

            byte[] theBytes = text.getBytes(Charset.forName("UTF-16"));
            inputStream = new ByteArrayInputStream(theBytes);

            InputStreamReader iReader = new InputStreamReader(inputStream,Charset.forName("UTF-16"));
            BufferedReader bufferedReader = new BufferedReader(iReader);

            oo = new OutputStreamWriter(outputStream);

            String nextLine;

            while ((nextLine = bufferedReader.readLine())!=null) {
                oo.write(nextLine);
            }
            //need to GC the inputsteam myself!!!!
            inputStream = null;

        } 

但我无法使用该方法获得正确的编码,有些字符会有所不同,例如“ - ”变为“〔。我仍然需要将字符串分成块,所以我没有看到改变的目的(我实现这个错误的方法吗?请告诉我用bufferedReader做正确的方法)。

...而且我终于找到了两种速度更快的方法,甚至不需要将字符串分成很多块。

    String[] bookPieces = splitString(htmlContent, Math.round(htmlContent.length()/100));
    FileWriter fileWriter = null;
    BufferedWriter bufferWritter = null;
    try {
        fileWriter = new FileWriter(new File(dir, appBook.getPath()));
        bufferWritter = new BufferedWriter(fileWriter);

        //Has to append, if write than OOM.
        bufferWritter.append(htmlContent);

        toReturn = true;

    }

//而不是一个比上面的

慢得多的编码
    //Need to split large strings in 100 chuncks
    String[] bookPieces = splitString(htmlContent, Math.round(htmlContent.length()/100));
    BufferedWriter bufferWritter = null;
    OutputStreamWriter osw= null;
    try {
        // Create osw and assign it an Encoding
        osw = new OutputStreamWriter(
                new FileOutputStream(new File(dir, appBook.getPath())),
                Charset.forName("UTF-16"));
        bufferWritter = new BufferedWriter(osw);
        for (String text : bookPieces) {
            bufferWritter.write(text); //write faster than append here
        }

        toReturn = true;

    }

1 个答案:

答案 0 :(得分:1)

这是一种简单但更高效的编写代码的方式,IMO:

int buffSize = Math.min(65536, htmlContent.length());
try (Writer osw = new OutputStreamWriter(
            new FileOutputStream(new File(dir, appBook.getPath())),
            Charset.forName("UTF-16"));
     BufferedWriter bw = new BufferedWriter(osw, buffSize)) {
    bw.write(htmlContent);
}

关于代码的说明:

  1. 此版本不会拆分文本。 BufferedWriter.write(String)代码根据BufferedWriter的缓冲区大小提取,转换和写出字符串中的字符串字符。做自己的分块是无效的。

  2. 此版本根据正在写入的字符串的大小设置BufferedWriter的缓冲区大小。但是超过一定的大小(并且65K是一个猜测),你不会通过增加缓冲区大小来获得任何性能优势。

  3. 此版本使用"尝试使用资源"防止资源泄漏。


  4. 进一步的想法。

    使用NIO可能会获得更高的性能。

    通过使用讨厌的反射来访问String对象的private字符数组,可能会获得更高的性能。 (不要这样做。这是一个不好的想法!)

    更好的方法可能是不将HTML组装成一个巨大的字符串。相反,将构成HTML的字符/字符串直接写入BufferedWriter。这样就无需一次将整个HTML保存在内存中 1


    1 - 假设你使用的StringBuilder没有大小提示,你需要最多3N个char[]的字符来组装一个大小为N的字符串。如果你有一个好的大小提示,你只需要2N个字符......