大家好!
我遇到了一个小挑战,即基于分组数据为每个唯一ID创建一个新变量。
这是我的数据集:
ID = c("1", "1", "2", "2", "2", "3", "4")
CAL_YEAR = c("2010", "2011", "2010", "2011", "2011", "2012", "2013")
T_F = c("T", "F", "F", "T", "F", "F", "T")
DF_1 = data.frame(ID, CAL_YEAR, T_F)
这应该是我的最终输出:
ID = c("1", "1", "2", "2", "2", "3", "4")
CAL_YEAR = c("2010", "2011", "2010", "2011", "2011", "2012", "2013")
T_F = c("T", "F", "F", "T", "F", "F", "T")
VAR_TF = c("T", "F", "F", "T + F", "T + F", "F", "T")
DF_2 = data.frame(ID, CAL_YEAR, T_F, VAR_TF)
我正在寻找一种优雅的方法来
对于每个唯一ID,按照CAL_YEAR:如果仅T_F =“ T”或“ F”,则VAR_TF =“ T”或“ F”
我的难题是使用唯一ID“ 2”,CAL_YEAR“ 2011”,其中T_F包含“ T”和“ F”。对于这种情况,我希望每个“ T”和“ F”的VAR_TF为=“ T + F”。
答案 0 :(得分:0)
我们可以使用ave
,方法是检查“ T_F”中按“ ID”,“ CAL_YEAR”分组的唯一元素的长度,如果大于1,则返回“ T + F”,否则返回原始向量
DF_1$VAR_TF <- with(DF_1, ave(as.character(T_F), ID, CAL_YEAR,
FUN = function(x) if(length(unique(x)) > 1) "T + F" else x))
identical(DF_1$VAR_TF, as.character(DF_2$VAR_TF))
#[1] TRUE
或使用dplyr
library(dplyr)
DF_1 %>%
group_by(ID, CAL_YEAR) %>%
mutate(VAR_TF = if(n_distinct(T_F) > 1) "T + F" else as.character(T_F))