我有一个文本文件:
The annual festival. Of every man is the fund which originally.
Supplies it with all the necessaries? And conveniences of birth which
it annually forgone! And which consist always either in the immediate
produce of that action, or in what is wasted with that produce from
other nations.
我需要把它分成句子。它是一个简单的版本,但您可以假设所有句子都以下列.
?
!
之一结尾,并在其中一个标点符号后面加上空格和大写字母。
我尝试了使用函数strsplit
的各种方法,这是关闭的,但仍然是错误的。
strsplit(textfile2,{'. ','! ','? '}) %doesnt work fully
textfil2 =
'The annual festival' [1x80 char] [1x53 char] [1x133 char]
我希望我的输出位于字符串单元格数组中,如:
The annual festival
Of every man is the fund which originally
Supplies it with all the necessaries
And conveniences of birth which it annually forgone
And which consist always either in the immediate produce of that action, or in what is wasted with that produce from other nations
但每个人都没有结束时间。有什么想法吗?
答案 0 :(得分:2)
这可以使用MATLAB中的regexp
来完成。
text='The annual festival. Of every man is the fund which originally. Supplies it with all the necessaries? And conveniences of birth which it annually forgone! And which consist always either in the immediate produce of that action, or in what is wasted with that produce from other nations.'
SplitString=regexp(text,'[\.?!,]','split')
for it=1:length(SplitString)
display(SplitString(it));
end
答案 1 :(得分:2)
使用花括号从strsplit
:
x{1}
如果你想在句子末尾保留标点符号:
sentences = regexp(textarray,'\S.*?[\.\!\?]','match')
正确的分割方式,没有尾随标点符号,并保留最后一句:
sentences = regexp(text,'[\.\!\?]\s*','split')
快速检查输出:char(sentences)
。