Java中摘要生成器的意外输出

时间:2012-03-16 05:23:11

标签: java

参考这个问题:Score each sentence in a line based upon a tag and summarize the text. (Java),我正在研究Java中的摘要生成器。 现在,在上面提到的问题已经完成所有这些之后,一个小问题仍然存在。 我正在重述我正在尝试做的事情。从给定的文本文件中,我想获取单个句子,然后在某些标签的基础上对它们进行评分,最后,将高分的句子写入包含摘要的文件中。 这是代码的相关部分(非常感谢上述问题中的回答者):

        ArrayList<Integer> scoreTracker = new ArrayList<Integer>();

        Pattern tagFinder = Pattern.compile("/JJ|" + freqWords); // tag for adjective,if you need to add more tags,just use - /JJ|/RB|/NN and so on..every tag separated by | .. tag1|tag2|tag3 and so on 
        while ((line = reader.readLine()) != null) { // scan input file line by line
            BreakIterator bi = BreakIterator.getSentenceInstance(); // one line may contain many sentences,so using BreakIterator to break sentences
            bi.setText(line);
            int end, start = bi.first();
            while ((end = bi.next()) != BreakIterator.DONE) { // for every new sentence in line
                String sentence = line.substring(start, end); // store one sentence
                String tagged = tagger.tagString(sentence); // tag this sentence
                int score = 0; // score for the sentence
                Matcher tag = tagFinder.matcher(tagged); //using a Matcher to find the tag in the tagged sentence
                while (tag.find()) // if tag exists
                    score++; // increment score by one
                scoreTracker.add(score);
                if (score > 5) // for a score greater than 5,write the sentence,not the line,into the summary
                    writerForTempFile.write(sentence);
                start = end; // set start = end to commence to the next sentence in the line

                }
        }

        System.out.println(scoreTracker);

现在问题: 我有以下文本文件供我测试:

This is a sample text.
This is a new line in the sample document.This next line is just to test adjacent sentences in the document.Because test runs suggest that immediate sentences are included in the final result due to new line delimiter usage and not sentence terminator usage.

Then we have a paragraph space.
Then there is this long line that has many words ,so it should be important. Should it?

That should be enough for testing. 

test test test test test test.

它包含9个句子。但该程序只找到7.它确实得分。但是,在那之后,我的条件是用score>5打印句子。但它也打印出score<5的某些句子。我添加了一个ArrayList来跟踪每个句子的得分。这就是我所知道的,只有7个句子得分,而得分少于5的句子也被打印出来。 请耐心等待,经过大量试验,我无法找到我出错的地方。

0 个答案:

没有答案