有两列,需要第三列使用dplyr减去两列。 为清楚起见,非常简单的例子。在我的情况下,拆分/分离方法无效。
x <- c("FRANCE","GERMANY","RUSSIA")
y <- c("Paris FRANCE", "Berlin GERMANY", "Moscow RUSSIA")
cities <- data.frame(x,y)
cities
x y
1 FRANCE Paris FRANCE
2 GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA
预期结果:
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin
3 RUSSIA Moscow RUSSIA Moscow
到目前为止我做过的尝试(无济于事):
这得到了相同的df但是移除了城市(与期望相反)
cities %>% mutate(new = setdiff(x,y))
x y new
1 FRANCE Paris FRANCE FRANCE
2 GERMANY Berlin GERMANY GERMANY
3 RUSSIA Moscow RUSSIA RUSSIA
相反,setdiff以相反的顺序获得相同的初始数据
cities %>% mutate(new = setdiff(y,x))
x y new
1 FRANCE Paris FRANCE Paris FRANCE
2 GERMANY Berlin GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA Moscow RUSSIA
使用gsub删除只为第一行发出警告
cities %>% mutate(new = gsub(x,"",y))
Warning message:
In gsub(x, "", y) :
argument 'pattern' has length > 1 and only the first element will be used
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA Moscow RUSSIA
答案 0 :(得分:2)
我们可以使用stringr::str_replace
:
library(tidyverse)
cities %>%
mutate_if(is.factor, as.character) %>%
mutate(new = trimws(str_replace(y, x, "")))
# x y new
#1 FRANCE Paris FRANCE Paris
#2 GERMANY Berlin GERMANY Berlin
#3 RUSSIA Moscow RUSSIA Moscow
答案 1 :(得分:1)
以下是基础R的解决方案:
x <- c("FRANCE","GERMANY","RUSSIA")
y <- c("Paris FRANCE", "Berlin GERMANY", "Moscow RUSSIA")
cities <- data.frame(x,y,stringsAsFactors = F)
cities$new = mapply(function(a,b)
{setdiff(strsplit(a,' ')[[1]],strsplit(b,' ')[[1]])}, cities$y, cities$x)
输出:
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin
3 RUSSIA Moscow RUSSIA Moscow
希望这有帮助!