Question

我想写一个比较R中两个字符串的函数。更确切地说，如果a有这个数据：

data <- list(
  "First sentence.",
  "Very first sentence.",
  "Very first and only one sentences."
)

我希望输出为：

[1] "Very"                    " and only one sentences"

我的输出是由前一个未包含的所有子字符串构建的。例如：

第2对第1，删除匹配的字符串 - ＆＃34;第一句。＆＃34; - 从第2开始，结果就是＆＃34;非常＆＃34;。

#       "First sentence."
#  "Very first sentence."
# match: ^^^^^^^^^^^^^^^

现在比较第3和第2，删除匹配的字符串 - ＆＃34;非常第一＆＃34; - 从第3起，所以结果是＆＃34;并且只有一个句子＆＃34;。

#       "Very first sentence."
#       "Very first and only one sentences."
# match: ^^^^^^^^^^

然后比较第4和第3等......

所以根据这个例子我的输出应该是：

c("Very", " and only one sentences")
# [1] "Very"                    " and only one sentences"

Answer 1

这是一个整齐的方法：

library(dplyr)
library(tidyr)

# put data in a data.frame
data_frame(string = unlist(data)) %>% 
    # add ID column so we can recombine later
    add_rownames('id') %>% 
    # add a lagged column to compare against
    mutate(string2 = lag(string)) %>% 
    # break strings into words
    separate_rows(string) %>% 
    # evaluate the following calls rowwise (until regrouped)
    rowwise() %>% 
    # chop to rows with a string to compare against,
    filter(!is.na(string2), 
           # where the word is not in the comparison string
           !grepl(string, string2, ignore.case = TRUE)) %>% 
    # regroup by ID
    group_by(id) %>%
    # reassemble strings
    summarise(string = paste(string, collapse = ' '))

## # A tibble: 2 x 2
##      id                  string
##   <chr>                   <chr>
## 1     2                    Very
## 2     3 and only one sentences.

如果您只想添加一个向量

，请选择string

 ...
    %>% `[[`('string')

## [1] "Very"                    "and only one sentences."

比较相同字符串

1 个答案: