我试图将文本分成各个段落。我确实发现了this个问题和this个问题。但是,我已经弄明白了如何检测这些段落。我无法保存它们。
One morning, when Gregor Samsa woke from troubled dreams, he found
himself transformed in his bed into a horrible vermin. He lay on
his armour-like back, and if he lifted his head a little he could
see his brown belly, slightly domed and divided by arches into stiff
sections. The bedding was hardly able to cover it and seemed ready
to slide off any moment. His many legs, pitifully thin compared
with the size of the rest of him, waved about helplessly as he
looked.
"What's happened to me?" he thought. It wasn't a dream. His room,
a proper human room although a little too small, lay peacefully
between its four familiar walls. A collection of textile samples
上述文字将计为两段。以下是我用于段落检测的功能。
public List<Paragraph> findParagraph(List<String> originalBook)
{
List<Paragraph> paragraphs = new LinkedList<Paragraph>();
List<String> sentences = new LinkedList<String>();
for(int i=0;i<originalBook.size();i++)
{
//if it isn't a blank line
//don't count I,II symbols
if(!originalBook.get(i).equalsIgnoreCase("") & originalBook.get(i).length()>2)
{
sentences.add(originalBook.remove(i));
//if the line ahead of where you are is a blank line you've reach the end of the paragraph
if(i < originalBook.size()-1)
{
if(originalBook.get(i+1).equalsIgnoreCase("") )
{
Paragraph paragraph = new Paragraph();
List<String> strings = sentences;
paragraph.setSentences(strings);
paragraphs.add(paragraph);
sentences.clear();
}
}
}
}
return paragraphs;
}
这是定义我的段落
的类public class Paragraph
{
private List<String> sentences;
public Paragraph()
{
super();
}
public List<String> getSentences() {
return sentences;
}
public void setSentences(List<String> sentences) {
this.sentences = sentences;
}
}
我能够很好地检测段落,但是我清除了所有句子,并且我得到的列表只包含最后一段。我一直试图想出一个解决方案而且我还没能想出一个解决方案。有人可以提供任何建议吗?
在我的解释中,我试图尽可能地彻底。如有必要,我可以添加更多细节。
答案 0 :(得分:2)
问题出在这个块中:
Paragraph paragraph = new Paragraph();
List<String> strings = sentences; // <-- !!!!!
paragraph.setSentences(strings);
paragraphs.add(paragraph);
sentences.clear();
您使用sentences
指向所有段落的相同对象,因此最终所有Paragraph
个对象都将指向相同 List<String>
。因此,您对sentences
所做的任何更改都将改变该List<String>
,并且所有Paragraph
个对象都会看到更改,因为它们都引用同一个实例。
有点像sentences
是一个气球,你正在做的是给你的所有Paragraph
个对象一个字符串通往那个气球(另外一个字符串返回sentences
)。如果其中一个对象(或sentences
引用)决定跟随字符串并弹出气球,则每个人都会看到更改。
解决方案很简单。跳过sentences.clear()
,只需使用List<String> strings = new LinkedList<>()
代替List<String> strings = sentences
。这样,您的所有Paragraph
个对象都会有不同的 List<String>
对象来保存他们的句子,而您对其中任何一个所做的更改将独立于另一个。如果这样做,您也可以跳过在方法开头声明sentences
。
答案 1 :(得分:0)
您可以将代码更改为更高效和干净,而不是计算其索引并创建多个if语句。
<强>样品:强>
Scanner scan = new Scanner(new File("text.txt"));
String parag = "";
while(scan.hasNextLine())
{
String s = scan.nextLine();
if(s.trim().length() != 0)
parag += s + "\n"; //new sentence
else
{
System.out.println(parag); //new paragraph
parag = "";
}
}
System.out.println(parag); //last paraggraph