Question

有两列，需要第三列使用dplyr减去两列。为清楚起见，非常简单的例子。在我的情况下，拆分/分离方法无效。

 x <- c("FRANCE","GERMANY","RUSSIA")
 y <- c("Paris FRANCE", "Berlin GERMANY", "Moscow RUSSIA")
 cities <- data.frame(x,y)

 cities
        x              y
1  FRANCE   Paris FRANCE
2 GERMANY Berlin GERMANY
3  RUSSIA  Moscow RUSSIA

预期结果：

     x              y      new
1  FRANCE   Paris FRANCE     Paris
2 GERMANY Berlin GERMANY     Berlin
3  RUSSIA  Moscow RUSSIA     Moscow

到目前为止我做过的尝试（无济于事）：

这得到了相同的df但是移除了城市（与期望相反）

 cities %>% mutate(new = setdiff(x,y))

            x              y     new
    1  FRANCE   Paris FRANCE  FRANCE
    2 GERMANY Berlin GERMANY GERMANY
    3  RUSSIA  Moscow RUSSIA  RUSSIA

相反，setdiff以相反的顺序获得相同的初始数据

 cities %>% mutate(new = setdiff(y,x))

        x                y       new
    1  FRANCE   Paris   FRANCE   Paris FRANCE
    2  GERMANY Berlin   GERMANY  Berlin GERMANY
    3  RUSSIA  Moscow   RUSSIA   Moscow RUSSIA

使用gsub删除只为第一行发出警告

  cities %>% mutate(new = gsub(x,"",y))

    Warning message:
    In gsub(x, "", y) :
      argument 'pattern' has length > 1 and only the first element will be used
            x              y            new
    1  FRANCE   Paris FRANCE         Paris 
    2 GERMANY Berlin GERMANY Berlin GERMANY
    3  RUSSIA  Moscow RUSSIA  Moscow RUSSIA

Answer 1

我们可以使用stringr::str_replace：

library(tidyverse)
cities %>%
    mutate_if(is.factor, as.character) %>%
    mutate(new = trimws(str_replace(y, x, "")))
#        x              y    new
#1  FRANCE   Paris FRANCE  Paris
#2 GERMANY Berlin GERMANY Berlin
#3  RUSSIA  Moscow RUSSIA Moscow

Answer 2

以下是基础R的解决方案：

x <- c("FRANCE","GERMANY","RUSSIA")
y <- c("Paris FRANCE", "Berlin GERMANY", "Moscow RUSSIA")
cities <- data.frame(x,y,stringsAsFactors = F)

cities$new = mapply(function(a,b) 
     {setdiff(strsplit(a,' ')[[1]],strsplit(b,' ')[[1]])}, cities$y, cities$x)

输出：

        x              y    new
1  FRANCE   Paris FRANCE  Paris
2 GERMANY Berlin GERMANY Berlin
3  RUSSIA  Moscow RUSSIA Moscow

希望这有帮助！

对于R数据帧，在dplyr行中减去两个字符串

2 个答案: