我想写一个比较R中两个字符串的函数。更确切地说,如果a有这个数据:
data <- list(
"First sentence.",
"Very first sentence.",
"Very first and only one sentences."
)
我希望输出为:
[1] "Very" " and only one sentences"
我的输出是由前一个未包含的所有子字符串构建的。例如:
第2对第1,删除匹配的字符串 - &#34;第一句。&#34; - 从第2开始,结果就是&#34;非常&#34;。
# "First sentence."
# "Very first sentence."
# match: ^^^^^^^^^^^^^^^
现在比较第3和第2,删除匹配的字符串 - &#34;非常第一&#34; - 从第3起,所以结果是&#34;并且只有一个句子&#34;。
# "Very first sentence."
# "Very first and only one sentences."
# match: ^^^^^^^^^^
然后比较第4和第3等......
所以根据这个例子我的输出应该是:
c("Very", " and only one sentences")
# [1] "Very" " and only one sentences"
答案 0 :(得分:2)
这是一个整齐的方法:
library(dplyr)
library(tidyr)
# put data in a data.frame
data_frame(string = unlist(data)) %>%
# add ID column so we can recombine later
add_rownames('id') %>%
# add a lagged column to compare against
mutate(string2 = lag(string)) %>%
# break strings into words
separate_rows(string) %>%
# evaluate the following calls rowwise (until regrouped)
rowwise() %>%
# chop to rows with a string to compare against,
filter(!is.na(string2),
# where the word is not in the comparison string
!grepl(string, string2, ignore.case = TRUE)) %>%
# regroup by ID
group_by(id) %>%
# reassemble strings
summarise(string = paste(string, collapse = ' '))
## # A tibble: 2 x 2
## id string
## <chr> <chr>
## 1 2 Very
## 2 3 and only one sentences.
如果您只想添加一个向量
,请选择string
...
%>% `[[`('string')
## [1] "Very" "and only one sentences."