如何基于另一个数据帧中唯一值的数量创建变量?

时间:2020-08-15 19:08:08

标签: r

这是我要做什么的简化示例。

数据集1(DF1)具有苹果的数据(例如孔的大小或数量),第二个数据集(DF2)具有在其中发现的蠕虫的信息,包括颜色以及在其中发现了苹果的信息。 我想做的是在DF1中添加一个变量,该变量具有每个苹果中存在的(蠕虫的)唯一颜色的数量。

DF1<-data.frame(x=c("A1","A2","A3","A4","A5"),y=c(3,26,5,27,5))
DF2<-data.frame(Q=c("A1","A1","A1","A1","A1","A1","A2","A2","A3","A3","A3","A4","A5","A5","A5","A5"),R=c("red","red","blue","yellow","yellow","blue","orange","orange","green","red","red","blue","blue", "purple","black","red"),S=c(4,5,3,5,4,3,5,4,3,5,4,3,5,4,3,5))

我是R语言的新手,在尝试解决它时,我想到了:

DF1$N.Colors<-length(unique(DF2$R[match(DF1$X,DF2$Q)]))

但是它给了我一个充满0的新变量,而不是想要的向量:

 DF1$N.Colors<-c(3,1,2,1,4)

非常感谢您的帮助

2 个答案:

答案 0 :(得分:3)

这可以通过使用两个数据集的'Q','x'列的联接,计算'R'的唯一值并将其分配给'DF1'中的新列来完成

library(data.table)
DF1$N.Colors <- setDT(DF2)[DF1, uniqueN(R), on = .(Q = x), by = .EACHI]$V1

或使用tidyverse

library(dplyr)
DF2 %>%
   group_by(x = Q) %>%
   summarise(N.Colors = n_distinct(R)) %>%
   right_join(DF1)

答案 1 :(得分:3)

具有baseaggregate()的{​​{1}}解决方案:

merge()