Question

我在名为Extract的类中有以下格式。这个类有一个main方法，它应该从文本文件中读取行并将它们输出到另一个文本文件。这是概述：

 Public class Extract{
    public static void main(String[] args){

      try{

        Scanner br = new Scanner(new FileInputStream(file_from)); // read from file
        Printwriter out = new PrintWriter ( new BufferedWriter (file_to))); // File to write to

        while(br.hasNextLine())
         {
            ArrayList<String> sentences = new ArrayList<String>();
            String some_sentence;
            for (int i = 0 ; i < 1000 ; i++)
              {
                 some_sentence = br.nextLine();
                 if (some_sentence != null){
                    sentence.add(some_sentence);
                  }
         }
        for (int i = 0 ; i < sentences.size() ; i++) 
         {
          some_sentence = sentences.get(i);
          // prepare sentence to be parsed
          Tree parsed = lp.parse(some_sentence);
          TreebankLanguagePack tlp = new PennTreebankLanguagePack();
          GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
          GrammaticalStructure gs = gsf.newGrammaticalStructure(parsed);
          Collection<TypedDependency> tdl = gs.typedDependencies();
          Iterator<TypedDependency> itr = tdl.iterator();

          out.println(some_sentence);
          out.println("\n");
          System.out.println(++count);
          while (itr.hasNext()) {
            TypedDependency temp = itr.next();
            out.println(temp);
            }
          }
      }

      } catch (Exception e) {
                        System.out.println("Something failed");
                        e.printStackTrace();
      }

  }
}

鉴于我在while循环的每次迭代中使用1000个新字符串初始化数组sentences，这是否会导致我的程序停止运行？我的程序正在退出，并显示以下错误消息：

NOT ENOUGH MEMORY TO PARSE SENTENCES OF LENGTH 500
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.createArrays(ExhaustivePCFGParser.java:2203)
        at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.considerCreatingArrays(ExhaustivePCFGParser.java:2173)
        at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:346)
        at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.parseInternal(LexicalizedParserQuery.java:238)
        at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.parse(LexicalizedParserQuery.java:530)
        at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:301)
        at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:279)
        at Pubmedparse2.main(Pubmedparse2.java:52

在我在while循环的每次迭代中释放1000个死String对象的情况下，垃圾收集是否无法正常工作？（注意，错误中列出的包是用于将句子解析为语法关系的包。）

感谢您的帮助。

Answer 1

虽然垃圾收集器是一个神秘的野兽，甚至以意想不到的方式突袭经验丰富的开发人员，但我怀疑它是否涉及这种情况（除非您的输入文件大于10 MB）。

我的猜测是你在代码中犯了一个错误。使用真实的调试器或poor man's debugging来查看代码的实际内容。

[编辑] 您似乎正在使用the Stanford NLP parser。

来自文档：

至少100MB作为PCFG解析器运行，最长可达40个字;通常大约500MB的内存，以便能够使用因式模型解析类似长的典型新闻线句

查看Java VM的文档，了解如何为其提供更多内存。

由于垃圾收集问题，程序可能会停止运行

1 个答案: