我写了这段代码:
try(BufferedReader file = new BufferedReader(new FileReader("C:\\Users\\User\\Desktop\\big50m.txt"));){
String line;
StringTokenizer st;
while ((line = file.readLine()) != null){
st = new StringTokenizer(line); // Separation of integers of the file line
while(st.hasMoreTokens())
numbers.add(Integer.parseInt(st.nextToken())); //Converting and adding to the list of numbers
}
}
catch(Exception e){
System.out.println("Can't read the file...");
}
big50m文件有50.000.000个整数,我得到了这个运行时错误:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuffer.append(StringBuffer.java:367)
at java.io.BufferedReader.readLine(BufferedReader.java:370)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at unsortedfilesapp.UnsortedFilesApp.main(UnsortedFilesApp.java:37)
C:\Users\User\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 5 seconds)
我认为问题是名为line
的字符串变量。你能告诉我怎么样吗?
要解决这个问题 ?因为我想要快速阅读,所以我使用StringTokenizer。
答案 0 :(得分:1)
Create a BufferedReader
from the file and read()
char by char. Put digit char into a String
, then Integer.parseInt()
, skip any non-digit char and continue parsing on the the next digit, etc, etc.
答案 1 :(得分:0)
这是一个最小化内存使用量的版本。没有字节到字符转换。没有字符串操作。但在这个版本中它没有处理负数。
public static void main(final String[]a) {
final Set<Integer> number = new HashSet<>();
int v = 0;
boolean use = false;
int c;
// Input stream avoid char conversion
try(InputStream s = new FileInputStream("C:\\Users\\User\\Desktop\\big50m.txt")) {
// No allocation in the loop
do {
if((c = s.read()) == -1) break;
if(c>='0' && c<='9') { v = v * 10 + c-'0'; use = true; continue; }
if(use) number.add(v);
use = false;
v = 0;
} while(true);
if(use) number.add(v);
} catch(final Exception e){ System.out.println("Can't read the file..."); }
}
答案 2 :(得分:0)
readLine()方法立即读取整行,从而占用大量内存。这是非常低效的,不会扩展到任意大文件。
您可以使用StreamTokenizer
像这样:StreamTokenizer tokenizer = new StreamTokenizer(new FileReader("bigfile.txt"));
tokenizer.parseNumbers(); // default behaviour
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
numbers.add((int)Math.round(tokenizer.nval));
}
}
我没有测试过这段代码,但它给了你一般的想法。
答案 3 :(得分:0)
在使用-Xmx2048m运行程序时,提供的代码段有效(通过一些调整:声明的数字为List number = new ArrayList&lt;&gt;(50000000);)
答案 4 :(得分:0)
由于所有数字都在一行内,BufferedReader
方法无法正常工作或扩展。完整的文件将被读入内存。因此,流媒体方法(例如来自@whbogado)确实是要走的路。
StreamTokenizer tokenizer = new StreamTokenizer(new FileReader("bigfile.txt"));
tokenizer.parseNumbers(); // default behaviour
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
numbers.add((int)Math.round(tokenizer.nval));
}
}
在你写作的时候,你也得到了一个堆空间错误,我认为,这不再是流媒体的问题了。不幸的是,您将所有值存储在List中。我认为这是现在的问题。你在评论中说,你不知道实际的数字数。因此,您应该避免将它们存储在列表中并在此处执行某种流式传输。
对于所有感兴趣的人,这是我的小测试代码(java 8),生成所需大小USED_INT_VALUES
的测试文件。我现在限制它为5 000 000个整数。正如您所看到的那样,在读取文件时,内存会稳定增加。拥有这么多记忆的唯一地方是数字 List
。
请注意,初始化具有初始容量的ArrayList
不会分配存储对象所需的内存,在您的情况下Integers
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StreamTokenizer;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.logging.Level;
import java.util.logging.Logger;
public class TestBigFiles {
public static void main(String args[]) throws IOException {
heapStatistics("program start");
final int USED_INT_VALUES = 5000000;
File tempFile = File.createTempFile("testdata_big_50m", ".txt");
System.out.println("using file " + tempFile.getAbsolutePath());
tempFile.deleteOnExit();
Random rand = new Random();
FileWriter writer = new FileWriter(tempFile);
rand.ints(USED_INT_VALUES).forEach(i -> {
try {
writer.write(i + " ");
} catch (IOException ex) {
Logger.getLogger(TestBigFiles.class.getName()).log(Level.SEVERE, null, ex);
}
});
writer.close();
heapStatistics("large file generated - size=" + tempFile.length() + "Bytes");
List<Integer> numbers = new ArrayList<>(USED_INT_VALUES);
heapStatistics("large array allocated (to avoid array copy)");
int c = 0;
try (FileReader fileReader = new FileReader(tempFile);) {
StreamTokenizer tokenizer = new StreamTokenizer(fileReader);
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
numbers.add((int) tokenizer.nval);
c++;
}
if (c % 100000 == 0) {
heapStatistics("within loop count " + c);
}
}
}
heapStatistics("large file parsed nummer list size is " + numbers.size());
}
private static void heapStatistics(String message) {
int MEGABYTE = 1024 * 1024;
//clean up unused stuff
System.gc();
Runtime runtime = Runtime.getRuntime();
System.out.println("##### " + message + " #####");
System.out.println("Used Memory:" + (runtime.totalMemory() - runtime.freeMemory()) / MEGABYTE + "MB"
+ " Free Memory:" + runtime.freeMemory() / MEGABYTE + "MB"
+ " Total Memory:" + runtime.totalMemory() / MEGABYTE + "MB"
+ " Max Memory:" + runtime.maxMemory() / MEGABYTE + "MB");
}
}