更快地用Java编写文件的方法

时间:2014-11-05 07:40:10

标签: java file-io filewriter bufferedwriter

我正在使用当前函数读取大文件,然后将其分发给不同的较短文件。 100 MB文件需要13分钟。

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;


public class DivideData {

public static void main(String[] args) throws IOException {
    Scanner data =  new Scanner(new File("D:\\P&G\\March Sample Data\\march.txt"));

    long startTime = System.currentTimeMillis();
    while(data.hasNextLine()){                          
        String line = data.nextLine();
        String[] split = line.split("\t");
        String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " "+ split[1]+ ".txt";
        //System.out.println((filename));
        //System.out.println(line); 

        FileWriter fw = new FileWriter(filename,true); //the true will append the new data
        fw.write(line);//appends the string to the file
        fw.write('\n');
        fw.close();         

    }
    long stopTime = System.currentTimeMillis();
    System.out.println(stopTime - startTime);
    data.close();
    System.out.println("Data Scueessfully Divided!!");
}

}

我想知道我能做些什么来减少花费的时间。

4 个答案:

答案 0 :(得分:3)

在循环外部打开和关闭FileWriter,

FileWriter fw = new FileWriter(filename,true); // <-- here!
while(data.hasNextLine()){                          
    String line = data.nextLine();
    String[] split = line.split("\t");
    String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " "
            + split[1]+ ".txt";
    //System.out.println((filename));
    //System.out.println(line); 
    // FileWriter fw = new FileWriter(filename,true);

否则它必须打开文件并寻找每一行输入的结尾!

修改

我注意到你的循环中没有filename。我们使用Map来保留缓存。

FileWriter fw = null;
Map<String, FileWriter> map = new HashMap<>();
while (data.hasNextLine()) {
    String line = data.nextLine();
    String[] split = line.split("\t");
    String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " "
            + split[1] + ".txt";
    // System.out.println((filename));
    // System.out.println(line);
    if (map.containsKey(filename)) {
        fw = map.get(filename);
    } else {
        fw = new FileWriter(filename, true);
        map.put(filename, fw);
    }
    // ...
}
for (FileWriter file : map.values()) {
    file.close();
}

答案 1 :(得分:2)

与Elliot的解决方案类似。符合性能增强。

Map<String, PrintWriter> map = new LinkedHashMap<String, PrintWriter>(128, 0.7f, true) {
    protected boolean removeEldestEntry(Map.Entry<String, PrintWriter> eldest) {
        if (size() > 200) {
            eldest.getValue().close();
            return true;
        }
        return false;
    }
};

while (data.hasNextLine()) {
    String line = data.nextLine();
    // only split the first two as that is all we need.
    String[] split = line.split("\t", 3);
    String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " " + split[1] + ".txt";
    // get once, is faster than contains + get
    PrintWriter pw = map.get(filename);
    if (pw == null)
        map.put(filename, pw = new PrintWriter(new BufferedWriter(new FileWriter(filename))));
    // writing to a BufferedWriter is faster than flushing each line, 
    // unless the lines are very long.
    pw.println(line); // use system line separator.
}
for (Writer writer : map.values())
    writer.close();

这将更有效,并且不会用完文件描述符。

答案 2 :(得分:1)

每次循环时都不要打开和关闭文件。之前打开它然后关闭它。你会发现这个数量级更快。

答案 3 :(得分:0)

请你使用BufferedReader&amp; amp; BufferedWriter来实现这个目标吗?我想它可能会更快 似乎你会在循环中重新打开作者? //添加:更大的堆大小可能对性能有很大帮助。