我正在尝试根据一个(或几个)因素的级别向数据框添加新列。我从一个包含两个因子和一个变量的数据框开始
library(dplyr)
test <- data_frame(one = letters[1:5], two = LETTERS[1:5], three = 6:10)
我想添加一个新列four
,其中包含某些one
和two
级别的值。为方便起见,我将这些新值保存在自己的小表中:
new_fourth_a <- data_frame(one = "b", four = 47)
new_fourth_b <- data_frame(two = c("C","E"), four = 42)
正确答案是
one two three four
(chr) (chr) (int) (dbl)
1 a A 6 NA
2 b B 7 47
3 c C 8 42
4 d D 9 NA
5 e E 10 42
我能想到的最好方法是通过left_join()
:
test %>%
left_join(new_fourth_a, by = "one") %>%
left_join(new_fourth_b, by = "two")
但最终会重复four
列。这可能是一件好事:它可以轻松检查是否有任何连接为新列引入了多个值(即检查所有列中每行只有一个非NA值)从four.
开始。不过,我认为必须有一个更简单的方法吗?
答案 0 :(得分:2)
以下是使用join
的解决方案library(dplyr)
test <- data_frame(one = letters[1:5], two = LETTERS[1:5], three = 6:10)
new_fourth_a <- data_frame(one = "b", extra_a = 47)
new_fourth_b <- data_frame(two = c("C","E"), extra_b = 42)
test %>%
left_join(new_fourth_a, by = "one") %>%
left_join(new_fourth_b, by = "two") %>%
mutate(four = pmax(extra_a, extra_b, na.rm = TRUE)) %>%
select(-extra_a, -extra_b)
如果您想处理任意数字,那么您一次只能处理一个句柄
library(dplyr)
test <- data_frame(one = letters[1:5], two = LETTERS[1:5], three = 6:10)
new_fourth_a <- data_frame(one = "b", extra = 47)
new_fourth_b <- data_frame(two = c("C","E"), extra = 42)
test %>%
left_join(new_fourth_a, by = "one") %>%
mutate(four = extra) %>%
select(-extra) %>%
left_join(new_fourth_b, by = "two") %>%
mutate(four = ifelse(is.na(extra), four, extra)) %>%
select(-extra)
答案 1 :(得分:1)
我们可以使用data_frame
和一些算术来获取数字索引来创建值为“47”和“42”的值“4”。
%in%
。 >
test %>%
mutate(four = c(NA, 47, 42)[1+(one %in% 'b') +
2*(two %in% c('C', 'E'))])
# one two three four
# (chr) (chr) (int) (dbl)
#1 a A 6 NA
#2 b B 7 47
#3 c C 8 42
#4 d D 9 NA
#5 e E 10 42