我有a
和b
字符串组成我的data
。我的目的是获得一个包含重复单词的新变量。
a = c("the red house av", "the blue sky", "the green grass")
b = c("the house built", " the sky of the city", "the grass in the garden")
data = data.frame(a, b)
基于此answer,我可以了解那些使用duplicated()
data = data%>% mutate(c = paste(a,b, sep = " "),
d = vapply(lapply(strsplit(c, " "), duplicated), paste, character(1L), collapse = " "))
但我无法获得这些词语。我想要的数据应该是这样的
> data.1
a b d
1 the red house av the house built the house
2 the blue sky the sky of the city the sky
3 the green grass the grass in the garden the grass
对上述功能的任何帮助都将受到高度赞赏。
答案 0 :(得分:5)
a = c("the red house av", "the blue sky", "the green grass")
b = c("the house built", " the sky of the city", "the grass in the garden")
data <- data.frame(a, b, stringsAsFactors = FALSE)
func <- function(dta) {
words <- intersect( unlist(strsplit(dta$a, " ")), unlist(strsplit(dta$b, " ")) )
dta$c <- paste(words, collapse = " ")
return( as.data.frame(dta, stringsAsFactors = FALSE) )
}
library(dplyr)
data %>% rowwise() %>% do( func(.) )
结果:
#Source: local data frame [3 x 3]
#Groups: <by row>
#
## A tibble: 3 x 3
# a b c
#* <chr> <chr> <chr>
#1 the red house av the house built the house
#2 the blue sky the sky of the city the sky
#3 the green grass the grass in the garden the grass
答案 1 :(得分:1)
这是使用基础R的另一种尝试(不需要包装):
df$c <- apply(df,1,function(x)
paste(Reduce(intersect, strsplit(x, " ")), collapse = " "))
# a b c
# 1 the red house av the house built the house
# 2 the blue sky the sky of the city the sky
# 3 the green grass the grass in the garden the grass
数据强>
df <- structure(list(a = c("the red house av", "the blue sky", "the green grass"
), b = c("the house built", " the sky of the city", "the grass in the garden"
)), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")