我的java程序需要太多内存。当我运行我的程序时,一段时间后CPU使用率达到100%,程序和系统停止运行。我试过的是"增加java堆大小,但它没有帮助"。
如果有人知道出了什么问题,请帮助我。
这是我正在运行的代码(它是由棕色语料库训练的词性训练师)
public void readBrownCorpus(String corpusPath) throws IOException {
BufferedReader inputStream = null;
try {
inputStream = new BufferedReader(new FileReader(corpusPath));
String corpusData = inputStream.readLine();
String previousTag = "^";
String wordWithTag[] = corpusData.split(" ");
for (int i = 0; i < wordWithTag.length; i++) {
String word[] = wordWithTag[i].split("_");
if (word != null && word.length != 2)
throw new Exception("Error in the Format of Corpus");
// If new tag found,insert this in both transitionTable and
// emissionTable
if (transitionTable.get(word[1]) == null) {
insertTagInTransitionTable(word[1]);
insertTagInEmissionTable(word[1]);
}
if (emissionTable.get(word[0]) == null) {
insertWordinEmissionTable(word[0]);
}
updateTranstionTable(previousTag, word[1]);
updateEmissionTable(word[0], word[1]);
if (word[1].equals(".")) {
previousTag = "^";
} else {
previousTag = word[1];
}
System.out.println(transitionTable.size());
}
} catch (IOException ioException) {
ioException.printStackTrace();
} catch (Exception exception) {
exception.printStackTrace();
} finally {
if (inputStream != null)
inputStream.close();
}
}
这是另一个功能
// This is used to insert the newly found tag in the transition table
private void insertTagInTransitionTable(String tag) throws CloneNotSupportedException
{
for(String key : transitionTable.keySet())
{
Row row=transitionTable.get(key);
row.tagCount.put(tag, 0f);
}
// get a row from transition table
Row newRow = (Row)transitionTable.get("^").Clone();
for(String key: newRow.tagCount.keySet())
{
newRow.tagCount.put(key,0f);
}
transitionTable.put(tag, newRow);
}
// This is used to insert the newly found tag in the emissionTable
private void insertTagInEmissionTable(String tag)
{
for(String key : emissionTable.keySet())
{
Row row = emissionTable.get(key);
row.tagCount.put(tag, 0f);
}
}
// This method insert the word in emission table
private void insertWordinEmissionTable(String word) throws CloneNotSupportedException
{
// get a row from emission table
Row newRow = (Row)emissionTable.get("#c1").Clone();
for(String key: newRow.tagCount.keySet())
{
newRow.tagCount.put(key,0f);
}
emissionTable.put(word, newRow);
}
// This method is used to update the transitionTable
private void updateTranstionTable(String previousTag,String currentTag)
{
Row row = transitionTable.get(previousTag);
row.tagCount.put(currentTag, row.tagCount.get(currentTag)+1);
}
// This method is used to update the emission table
private void updateEmissionTable(String word,String tag)
{
Row row = emissionTable.get(word);
row.tagCount.put(tag, row.tagCount.get(tag)+1);
}
答案 0 :(得分:2)
我没有完整的代码或数据,所以这些可能无法解决,但我可以看到需要改进的地方:
像这样的代码,到目前为止每个条目都在旋转,并且将计数设置为0
只会越来越慢。删除它并稍后处理缺席并视为0
。
//remove this
for(String key : transitionTable.keySet())
{
Row row = transitionTable.get(key);
row.tagCount.put(tag, 0f);
}
//Handle later on:
private void updateTranstionTable(String previousTag,String currentTag)
{
Row row = transitionTable.get(previousTag);
Integer tagCount = row.tagCount.get(currentTag);
int newTagCount = tagCount==null ? 1 : tagcount.intValue() + 1;
row.tagCount.put(currentTag, newTagCount);
}
这在内存方面更有效,因为您不存储永远不会增加的0
条目的加载。另外,它可以节省将0
放在Map
中的时间。
答案 1 :(得分:1)
我认为Map不是此任务的正确数据结构。 Multiset从Guava库很好地完成了Couting元素。 正如官方文档所述,这种代码:
Map<String, Integer> counts = new HashMap<String, Integer>();
for (String word : words) {
Integer count = counts.get(word);
if (count == null) {
counts.put(word, 1);
} else {
counts.put(word, count + 1);
}
}
总是可以使用具有 count()方法的Multiset进行翻译,该方法可以轻松检查特定元素添加到数据结构的次数。
也许这种操作可以提高你的表现。 但正如之前所建议的那样,您可以检查运行分析器的程序并正确检查代码,以便了解哪些部分会耗尽更多资源。