我正在使用当前函数读取大文件,然后将其分发给不同的较短文件。 100 MB文件需要13分钟。
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;
public class DivideData {
public static void main(String[] args) throws IOException {
Scanner data = new Scanner(new File("D:\\P&G\\March Sample Data\\march.txt"));
long startTime = System.currentTimeMillis();
while(data.hasNextLine()){
String line = data.nextLine();
String[] split = line.split("\t");
String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " "+ split[1]+ ".txt";
//System.out.println((filename));
//System.out.println(line);
FileWriter fw = new FileWriter(filename,true); //the true will append the new data
fw.write(line);//appends the string to the file
fw.write('\n');
fw.close();
}
long stopTime = System.currentTimeMillis();
System.out.println(stopTime - startTime);
data.close();
System.out.println("Data Scueessfully Divided!!");
}
}
我想知道我能做些什么来减少花费的时间。
答案 0 :(得分:3)
在循环外部打开和关闭FileWriter,
FileWriter fw = new FileWriter(filename,true); // <-- here!
while(data.hasNextLine()){
String line = data.nextLine();
String[] split = line.split("\t");
String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " "
+ split[1]+ ".txt";
//System.out.println((filename));
//System.out.println(line);
// FileWriter fw = new FileWriter(filename,true);
否则它必须打开文件并寻找每一行输入的结尾!
修改强>
我注意到你的循环中没有filename
。我们使用Map
来保留缓存。
FileWriter fw = null;
Map<String, FileWriter> map = new HashMap<>();
while (data.hasNextLine()) {
String line = data.nextLine();
String[] split = line.split("\t");
String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " "
+ split[1] + ".txt";
// System.out.println((filename));
// System.out.println(line);
if (map.containsKey(filename)) {
fw = map.get(filename);
} else {
fw = new FileWriter(filename, true);
map.put(filename, fw);
}
// ...
}
for (FileWriter file : map.values()) {
file.close();
}
答案 1 :(得分:2)
与Elliot的解决方案类似。符合性能增强。
Map<String, PrintWriter> map = new LinkedHashMap<String, PrintWriter>(128, 0.7f, true) {
protected boolean removeEldestEntry(Map.Entry<String, PrintWriter> eldest) {
if (size() > 200) {
eldest.getValue().close();
return true;
}
return false;
}
};
while (data.hasNextLine()) {
String line = data.nextLine();
// only split the first two as that is all we need.
String[] split = line.split("\t", 3);
String filename = "D:\\P&G\\March Sample Data\\" + split[0] + " " + split[1] + ".txt";
// get once, is faster than contains + get
PrintWriter pw = map.get(filename);
if (pw == null)
map.put(filename, pw = new PrintWriter(new BufferedWriter(new FileWriter(filename))));
// writing to a BufferedWriter is faster than flushing each line,
// unless the lines are very long.
pw.println(line); // use system line separator.
}
for (Writer writer : map.values())
writer.close();
这将更有效,并且不会用完文件描述符。
答案 2 :(得分:1)
每次循环时都不要打开和关闭文件。之前打开它然后关闭它。你会发现这个数量级更快。
答案 3 :(得分:0)
请你使用BufferedReader&amp; amp; BufferedWriter来实现这个目标吗?我想它可能会更快 似乎你会在循环中重新打开作者? //添加:更大的堆大小可能对性能有很大帮助。