将最后一句与R中的字符串分开

时间:2017-04-10 20:51:32

标签: r text text-mining

我有一个字符串向量,我想将最后一个句子与R中的每个字符串分开。

句子可能以句号(。)或甚至感叹号(!)结束。因此,我对如何将最后一句与R中的字符串分开感到困惑。

3 个答案:

答案 0 :(得分:2)

您可以使用strsplit从每个字符串中获取最后一句话,如下所示: -

## paragraph <- "Your vector here"
result <- strsplit(paragraph, "\\.|\\!|\\?")

last.sentences <- sapply(result, function(x) {
    trimws((x[length(x)]))
})

答案 1 :(得分:1)

如果您的输入足够干净(特别是句子之间有空格),您可以使用:

sub(".*(\\.|\\?|\\!) ", "", trimws(yourvector))

它找到以标点符号和空格结尾的最长子字符串并将其删除。

我添加了trimws以防万一你的某些字符串中有尾随空格。

示例:

u <- c("This is a sentence. And another sentence!",
       "By default R regexes are greedy. So only the last sentence is kept. You see ? ",
       "Single sentences are not a problem.",
       "What if there are no spaces between sentences?It won't work.",
       "You know what? Multiple marks don't break my solution!!",
       "But if they are separated by spaces, they do ! ! !")

sub(".*(\\.|\\?|\\!) ", "", trimws(u))
# [1] "And another sentence!"                                       
# [2] "You see ?"                                                   
# [3] "Single sentences are not a problem."                         
# [4] "What if there are no spaces between sentences?It won't work."
# [5] "Multiple marks don't break my solution!!"                    
# [6] "!"  

答案 2 :(得分:0)

此正则表达式使用$锚定到字符串的末尾,允许使用可选的&#39;。&#39;或者&#39;!&#39;在末尾。在前面它找到最接近的&#34;。 &#34;或&#34;! &#34;作为前一句的结尾。负面回顾?&lt; =确保&#34;。&#34;或者&#39;!&#39;不匹配。还通过使用^作为开头来提供单个句子。

s <- "Sentences may end with full stops(.) or even exclamatory marks(!). Hence i am confused as to how to separate the last sentence from a string in R."
library (stringr)
str_extract(s, "(?<=(\\.\\s|\\!\\s|^)).+(\\.|\\!)?$")

产量

# [1] "Hence i am confused as to how to separate the last sentence from a string in R."