我收到java.lang.OutOfMemoryError:从文本文件读取时GC开销限制超出错误。我不知道出了什么问题。我正在一个有足够内存的集群上运行我的程序。外部循环迭代为16000次,对于外循环的每次迭代,内循环迭代大约300,000次。当代码尝试从内循环读取一行时抛出错误。任何建议都将得到很好的理解。以下是我的代码片段:
//Read from the test data output file till not equals null
//Reads a single line at a time from the test data
while((line=br.readLine())!=null)
{
//Clears the hashmap
leastFive.clear();
//Clears the arraylist
fiveTrainURLs.clear();
try
{
StringTokenizer st=new StringTokenizer(line," ");
while(st.hasMoreTokens())
{
String currentToken=st.nextToken();
if(currentToken.contains("File"))
{
testDataFileNo=st.nextToken();
String tok="";
while((tok=st.nextToken())!=null)
{
if (tok==null) break;
int topic_no=Integer.parseInt(tok);
topic_no=Integer.parseInt(tok);
String prob=st.nextToken();
//Obtains the double value of the probability
double double_prob=Double.parseDouble(prob);
p1[topic_no]=double_prob;
}
break;
}
}
}
catch(Exception e)
{
}
//Used to read over all the training data file
FileReader fr1=new FileReader("/homes/output_train_2000.txt");
BufferedReader br1=new BufferedReader(fr1);
String line1="";
//Reads the training data output file,one row at a time
//This is the line on which an exception occurs!
while((line1=br1.readLine())!=null)
{
try
{
StringTokenizer st=new StringTokenizer(line1," ");
while(st.hasMoreTokens())
{
String currentToken=st.nextToken();
if(currentToken.contains("File"))
{
trainDataFileNo=st.nextToken();
String tok="";
while((tok=st.nextToken())!=null)
{
if(tok==null)
break;
int topic_no=Integer.parseInt(tok);
topic_no=Integer.parseInt(tok);
String prob=st.nextToken();
double double_prob=Double.parseDouble(prob);
//p2 will contain the probability values of each of the topics based on the indices
p2[topic_no]=double_prob;
}
break;
}
}
}
catch(Exception e)
{
double result=klDivergence(p1,p2);
leastFive.put(trainDataFileNo,result);
}
}
}
答案 0 :(得分:3)
16000 * 300000 = 4.8亿。如果每个令牌只占用6个字节,那么它本身就超过24GB。当垃圾收集器最终以24GB开始进入gc时,垃圾收集器将运行很长时间。好像你需要把它分解成更小的块。你可以将你的应用程序内存限制在1GB这样的合理范围内,这样GC就可以更快地启动,并且可以在它完成工作的时候完成任务。