比较相同字符串

时间:2016-07-25 18:59:36

标签: regex r string

我想写一个比较R中两个字符串的函数。更确切地说,如果a有这个数据:

data <- list(
  "First sentence.",
  "Very first sentence.",
  "Very first and only one sentences."
)

我希望输出为:

[1] "Very"                    " and only one sentences"

我的输出是由前一个未包含的所有子字符串构建的。例如:

第2对第1,删除匹配的字符串 - &#34;第一句。&#34; - 从第2开始,结果就是&#34;非常&#34;。

#       "First sentence."
#  "Very first sentence."
# match: ^^^^^^^^^^^^^^^

现在比较第3和第2,删除匹配的字符串 - &#34;非常第一&#34; - 从第3起,所以结果是&#34;并且只有一个句子&#34;。

#       "Very first sentence."
#       "Very first and only one sentences."
# match: ^^^^^^^^^^

然后比较第4和第3等......

所以根据这个例子我的输出应该是:

c("Very", " and only one sentences")
# [1] "Very"                    " and only one sentences"

1 个答案:

答案 0 :(得分:2)

这是一个整齐的方法:

library(dplyr)
library(tidyr)

# put data in a data.frame
data_frame(string = unlist(data)) %>% 
    # add ID column so we can recombine later
    add_rownames('id') %>% 
    # add a lagged column to compare against
    mutate(string2 = lag(string)) %>% 
    # break strings into words
    separate_rows(string) %>% 
    # evaluate the following calls rowwise (until regrouped)
    rowwise() %>% 
    # chop to rows with a string to compare against,
    filter(!is.na(string2), 
           # where the word is not in the comparison string
           !grepl(string, string2, ignore.case = TRUE)) %>% 
    # regroup by ID
    group_by(id) %>%
    # reassemble strings
    summarise(string = paste(string, collapse = ' '))

## # A tibble: 2 x 2
##      id                  string
##   <chr>                   <chr>
## 1     2                    Very
## 2     3 and only one sentences.

如果您只想添加一个向量

,请选择string
 ...
    %>% `[[`('string')

## [1] "Very"                    "and only one sentences."