Question

我有一个字符串向量，我想将最后一个句子与R中的每个字符串分开。

句子可能以句号（。）或甚至感叹号（！）结束。因此，我对如何将最后一句与R中的字符串分开感到困惑。

Answer 1

您可以使用strsplit从每个字符串中获取最后一句话，如下所示： -

## paragraph <- "Your vector here"
result <- strsplit(paragraph, "\\.|\\!|\\?")

last.sentences <- sapply(result, function(x) {
    trimws((x[length(x)]))
})

Answer 2

如果您的输入足够干净（特别是句子之间有空格），您可以使用：

sub(".*(\\.|\\?|\\!) ", "", trimws(yourvector))

它找到以标点符号和空格结尾的最长子字符串并将其删除。

我添加了trimws以防万一你的某些字符串中有尾随空格。

示例：

u <- c("This is a sentence. And another sentence!",
       "By default R regexes are greedy. So only the last sentence is kept. You see ? ",
       "Single sentences are not a problem.",
       "What if there are no spaces between sentences?It won't work.",
       "You know what? Multiple marks don't break my solution!!",
       "But if they are separated by spaces, they do ! ! !")

sub(".*(\\.|\\?|\\!) ", "", trimws(u))
# [1] "And another sentence!"                                       
# [2] "You see ?"                                                   
# [3] "Single sentences are not a problem."                         
# [4] "What if there are no spaces between sentences?It won't work."
# [5] "Multiple marks don't break my solution!!"                    
# [6] "!"

Answer 3

此正则表达式使用$锚定到字符串的末尾，允许使用可选的＆＃39;。＆＃39;或者＆＃39;！＆＃39;在末尾。在前面它找到最接近的＆＃34;。＆＃34;或＆＃34;！＆＃34;作为前一句的结尾。负面回顾？＆lt; =确保＆＃34;。＆＃34;或者＆＃39;！＆＃39;不匹配。还通过使用^作为开头来提供单个句子。

s <- "Sentences may end with full stops(.) or even exclamatory marks(!). Hence i am confused as to how to separate the last sentence from a string in R."
library (stringr)
str_extract(s, "(?<=(\\.\\s|\\!\\s|^)).+(\\.|\\!)?$")

产量

# [1] "Hence i am confused as to how to separate the last sentence from a string in R."

将最后一句与R中的字符串分开

3 个答案: