这是我的真实数据框的玩具版本:
df <- data.frame(
sample = c("s1", "s1", "s1", "s2", "s2", "s2", "s1", "s3", "s4"),
snp = c("snp1", "snp1", "snp1", "snp1", "snp1", "snp1", "snp2", "snp2", "snp2"),
random_column = 1:9
)
我有兴趣计算唯一的sample-snp对的数量,并将该值返回到每一行。在这种情况下:s1和s2具有snp1(因此size
对于所有重复行应为2,1-6),并且s1,s3和s4具有snp2(因此size
对于行应为3 7-9)。这将是预期的输出:
sample random snp size
(chr) (int) (chr) (int)
1 s1 1 snp1 2
2 s1 2 snp1 2
3 s1 3 snp1 2
4 s2 4 snp1 2
5 s2 5 snp1 2
6 s2 6 snp1 2
7 s1 7 snp2 3
8 s3 8 snp2 3
9 s4 8 snp2 3
我想我可以做到这一点,然后是某种类型的左连接,但我想知道是否有更简单的方法:
df[!duplicated(df[,c('sample','snp')]),] %>% group_by(snp) %>% summarize(size = n())