如何使用dplyr修改带有连接的单个列

时间:2016-03-07 07:51:48

标签: r dplyr

我正在尝试根据一个(或几个)因素的级别向数据框添加新列。我从一个包含两个因子和一个变量的数据框开始

library(dplyr)
test <- data_frame(one = letters[1:5], two = LETTERS[1:5], three = 6:10)

我想添加一个新列four,其中包含某些onetwo级别的值。为方便起见,我将这些新值保存在自己的小表中:

new_fourth_a <- data_frame(one = "b", four = 47)
new_fourth_b <- data_frame(two = c("C","E"), four = 42)

正确答案是

    one   two three  four
  (chr) (chr) (int) (dbl)
1     a     A     6    NA
2     b     B     7    47
3     c     C     8    42
4     d     D     9    NA
5     e     E    10    42

我能想到的最好方法是通过left_join()

test %>% 
  left_join(new_fourth_a, by = "one") %>%
  left_join(new_fourth_b, by = "two")

但最终会重复four列。这可能是一件好事:它可以轻松检查是否有任何连接为新列引入了多个值(即检查所有列中每行只有一个非NA值)从four.开始。不过,我认为必须有一个更简单的方法吗?

2 个答案:

答案 0 :(得分:2)

以下是使用join

的解决方案
library(dplyr)
test <- data_frame(one = letters[1:5], two = LETTERS[1:5], three = 6:10)
new_fourth_a <- data_frame(one = "b", extra_a = 47)
new_fourth_b <- data_frame(two = c("C","E"), extra_b = 42)
test %>% 
  left_join(new_fourth_a, by = "one") %>%
  left_join(new_fourth_b, by = "two") %>%
  mutate(four = pmax(extra_a, extra_b, na.rm = TRUE)) %>%
  select(-extra_a, -extra_b)

如果您想处理任意数字,那么您一次只能处理一个句柄

library(dplyr)
test <- data_frame(one = letters[1:5], two = LETTERS[1:5], three = 6:10)
new_fourth_a <- data_frame(one = "b", extra = 47)
new_fourth_b <- data_frame(two = c("C","E"), extra = 42)
test %>% 
  left_join(new_fourth_a, by = "one") %>%
  mutate(four = extra) %>%
  select(-extra) %>%
  left_join(new_fourth_b, by = "two") %>%
  mutate(four = ifelse(is.na(extra), four, extra)) %>%
  select(-extra)

答案 1 :(得分:1)

我们可以使用data_frame和一些算术来获取数字索引来创建值为“47”和“42”的值“4”。

而不是创建另外两个%in%。 >
 test %>%
     mutate(four = c(NA, 47, 42)[1+(one %in% 'b') + 
                         2*(two %in% c('C', 'E'))])
 #   one   two three  four
 #  (chr) (chr) (int) (dbl)
 #1     a     A     6    NA
 #2     b     B     7    47
 #3     c     C     8    42
 #4     d     D     9    NA
 #5     e     E    10    42