我有一个字符串向量,我想将最后一个句子与R中的每个字符串分开。
句子可能以句号(。)或甚至感叹号(!)结束。因此,我对如何将最后一句与R中的字符串分开感到困惑。
答案 0 :(得分:2)
您可以使用strsplit从每个字符串中获取最后一句话,如下所示: -
## paragraph <- "Your vector here"
result <- strsplit(paragraph, "\\.|\\!|\\?")
last.sentences <- sapply(result, function(x) {
trimws((x[length(x)]))
})
答案 1 :(得分:1)
如果您的输入足够干净(特别是句子之间有空格),您可以使用:
sub(".*(\\.|\\?|\\!) ", "", trimws(yourvector))
它找到以标点符号和空格结尾的最长子字符串并将其删除。
我添加了trimws
以防万一你的某些字符串中有尾随空格。
示例:
u <- c("This is a sentence. And another sentence!",
"By default R regexes are greedy. So only the last sentence is kept. You see ? ",
"Single sentences are not a problem.",
"What if there are no spaces between sentences?It won't work.",
"You know what? Multiple marks don't break my solution!!",
"But if they are separated by spaces, they do ! ! !")
sub(".*(\\.|\\?|\\!) ", "", trimws(u))
# [1] "And another sentence!"
# [2] "You see ?"
# [3] "Single sentences are not a problem."
# [4] "What if there are no spaces between sentences?It won't work."
# [5] "Multiple marks don't break my solution!!"
# [6] "!"
答案 2 :(得分:0)
此正则表达式使用$锚定到字符串的末尾,允许使用可选的&#39;。&#39;或者&#39;!&#39;在末尾。在前面它找到最接近的&#34;。 &#34;或&#34;! &#34;作为前一句的结尾。负面回顾?&lt; =确保&#34;。&#34;或者&#39;!&#39;不匹配。还通过使用^作为开头来提供单个句子。
s <- "Sentences may end with full stops(.) or even exclamatory marks(!). Hence i am confused as to how to separate the last sentence from a string in R."
library (stringr)
str_extract(s, "(?<=(\\.\\s|\\!\\s|^)).+(\\.|\\!)?$")
产量
# [1] "Hence i am confused as to how to separate the last sentence from a string in R."