Question

我的java程序需要太多内存。当我运行我的程序时，一段时间后CPU使用率达到100％，程序和系统停止运行。我试过的是＆＃34;增加java堆大小，但它没有帮助＆＃34;。

如果有人知道出了什么问题，请帮助我。

这是我正在运行的代码（它是由棕色语料库训练的词性训练师）

public void readBrownCorpus(String corpusPath) throws IOException {
    BufferedReader inputStream = null;
    try {
        inputStream = new BufferedReader(new FileReader(corpusPath));
        String corpusData = inputStream.readLine();
        String previousTag = "^";
        String wordWithTag[] = corpusData.split(" ");
        for (int i = 0; i < wordWithTag.length; i++) {
            String word[] = wordWithTag[i].split("_");
            if (word != null && word.length != 2)
                throw new Exception("Error in the Format of Corpus");
            // If new tag found,insert this in both transitionTable and
            // emissionTable
            if (transitionTable.get(word[1]) == null) {
                insertTagInTransitionTable(word[1]);
                insertTagInEmissionTable(word[1]);
            }
            if (emissionTable.get(word[0]) == null) {
                insertWordinEmissionTable(word[0]);
            }

            updateTranstionTable(previousTag, word[1]);
            updateEmissionTable(word[0], word[1]);
            if (word[1].equals(".")) {
                previousTag = "^";
            } else {
                previousTag = word[1];
            }
            System.out.println(transitionTable.size());
        }
    } catch (IOException ioException) {
        ioException.printStackTrace();
    } catch (Exception exception) {
        exception.printStackTrace();
    } finally {
        if (inputStream != null)
            inputStream.close();
    }
}

这是另一个功能

// This is used to insert the newly found tag in the transition table
    private void insertTagInTransitionTable(String tag) throws CloneNotSupportedException
    {
        for(String key : transitionTable.keySet())
        {
            Row row=transitionTable.get(key);
            row.tagCount.put(tag, 0f);
        }
            // get a row from transition table
            Row newRow = (Row)transitionTable.get("^").Clone();
            for(String key: newRow.tagCount.keySet())
            {
                newRow.tagCount.put(key,0f);
            }
            transitionTable.put(tag, newRow);
    }

    // This is used to insert the newly found tag in the emissionTable
    private void insertTagInEmissionTable(String tag)
    {
            for(String key : emissionTable.keySet())
            {
                Row row = emissionTable.get(key);
                row.tagCount.put(tag, 0f);
            }
    }

    // This method insert the word in emission table
    private void insertWordinEmissionTable(String word) throws CloneNotSupportedException
    {
                 // get a row from emission table
                    Row newRow = (Row)emissionTable.get("#c1").Clone();
                    for(String key: newRow.tagCount.keySet())
                    {
                        newRow.tagCount.put(key,0f);
                    }
                    emissionTable.put(word, newRow);
    }

    // This method is used to update the transitionTable
        private void updateTranstionTable(String previousTag,String currentTag)
        {
            Row row = transitionTable.get(previousTag);
            row.tagCount.put(currentTag, row.tagCount.get(currentTag)+1);
        }

        // This method is used to update the emission table
        private void updateEmissionTable(String word,String tag)
        {
            Row row = emissionTable.get(word);
            row.tagCount.put(tag, row.tagCount.get(tag)+1);
        }

Answer 1

我没有完整的代码或数据，所以这些可能无法解决，但我可以看到需要改进的地方：

像这样的代码，到目前为止每个条目都在旋转，并且将计数设置为0只会越来越慢。删除它并稍后处理缺席并视为0。

//remove this
for(String key : transitionTable.keySet())
{
    Row row = transitionTable.get(key);
    row.tagCount.put(tag, 0f);
}

//Handle later on:
private void updateTranstionTable(String previousTag,String currentTag)
{
    Row row = transitionTable.get(previousTag);
    Integer tagCount = row.tagCount.get(currentTag);
    int newTagCount = tagCount==null ? 1 : tagcount.intValue() + 1;
    row.tagCount.put(currentTag, newTagCount);
}

这在内存方面更有效，因为您不存储永远不会增加的0条目的加载。另外，它可以节省将0放在Map中的时间。

Answer 2

我认为Map不是此任务的正确数据结构。 Multiset从Guava库很好地完成了Couting元素。正如官方文档所述，这种代码：

Map<String, Integer> counts = new HashMap<String, Integer>();

for (String word : words) {
    Integer count = counts.get(word);
    if (count == null) {
        counts.put(word, 1);
    } else {
      counts.put(word, count + 1);
    }
}

总是可以使用具有 count（）方法的Multiset进行翻译，该方法可以轻松检查特定元素添加到数据结构的次数。

也许这种操作可以提高你的表现。但正如之前所建议的那样，您可以检查运行分析器的程序并正确检查代码，以便了解哪些部分会耗尽更多资源。

运行Java程序时CPU使用率为100％

2 个答案: