我有以下数据框:
testdf <- structure(list(gene = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L), .Label = c("Actc1", "Cbx1"), class = "factor"),
p1 = structure(c(5L, 1L, 2L, 3L, 4L, 1L, 1L, 1L, 1L, 1L), .Label = c("BoneMarrow",
"Liver", "Pulmonary", "Umbilical", "Vertebral"), class = "factor"),
p2 = structure(c(1L, 1L, 1L, 1L, 1L, 5L, 2L, 3L, 4L, 1L), .Label = c("Adipose",
"Liver", "Pulmonary", "Umbilical", "Vertebral"), class = "factor")), .Names = c("gene",
"p1", "p2"), class = "data.frame", row.names = c(NA, -10L))
testdf
#> gene p1 p2
#> 1 Cbx1 Vertebral Adipose
#> 2 Cbx1 BoneMarrow Adipose
#> 3 Cbx1 Liver Adipose
#> 4 Cbx1 Pulmonary Adipose
#> 5 Cbx1 Umbilical Adipose
#> 6 Actc1 BoneMarrow Vertebral
#> 7 Actc1 BoneMarrow Liver
#> 8 Actc1 BoneMarrow Pulmonary
#> 9 Actc1 BoneMarrow Umbilical
#> 10 Actc1 BoneMarrow Adipose
我想要做的是按gene
分组并计算p1
的频率。结果如下:
Cbx1 5 #Vertebral, Bone Marrow, Liver, Pulmonary, Umbilical
Actc1 1 #Bone Marrow
我尝试了这个,但它没有给出我想要的东西:
testdf %>% group_by(gene) %>% mutate(n=n())
答案 0 :(得分:3)
使用aggregate
aggregate(p1 ~ gene, testdf, function(x) length(unique(x)))
# gene p1
#1 Actc1 1
#2 Cbx1 5
答案 1 :(得分:2)
您可以使用n_distinct
来计算唯一值:
testdf %>% group_by(gene) %>% summarise(n = n_distinct(p1))
# A tibble: 2 x 2
# gene n
# <fctr> <int>
#1 Actc1 1
#2 Cbx1 5
答案 2 :(得分:1)
您也可以使用tapply
with(testdf,tapply(p1,gene,function(x)length(unique(x))))
Actc1 Cbx1
1 5