如何使用dplyr获取数据框中唯一行的数量(在列子集上)?

时间:2016-06-20 19:44:26

标签: r dataframe dplyr

这是我的真实数据框的玩具版本:

df <- data.frame(
  sample = c("s1", "s1", "s1", "s2", "s2", "s2", "s1",  "s3", "s4"),
  snp = c("snp1", "snp1", "snp1", "snp1", "snp1", "snp1", "snp2", "snp2", "snp2"),
  random_column = 1:9
)

我有兴趣计算唯一的sample-snp对的数量,并将该值返回到每一行。在这种情况下:s1和s2具有snp1(因此size对于所有重复行应为2,1-6),并且s1,s3和s4具有snp2(因此size对于行应为3 7-9)。这将是预期的输出:

  sample random   snp  size
   (chr)  (int) (chr) (int)
1     s1      1  snp1     2
2     s1      2  snp1     2
3     s1      3  snp1     2
4     s2      4  snp1     2
5     s2      5  snp1     2
6     s2      6  snp1     2
7     s1      7  snp2     3
8     s3      8  snp2     3
9     s4      8  snp2     3

我想我可以做到这一点,然后是某种类型的左连接,但我想知道是否有更简单的方法:

df[!duplicated(df[,c('sample','snp')]),] %>% group_by(snp) %>% summarize(size = n())

0 个答案:

没有答案