我想知道在记录的每个出生日期分娩的独特水坝的数量。我的数据框与此类似:
dam <- c("2A11","2A11","2A12","2A12","2A12","4D23","4D23","1X23")
bdate <- c("2009-10-01","2009-10-01","2009-10-01","2009-10-01",
"2009-10-01","2009-10-03","2009-10-03","2009-10-03")
mydf <- data.frame(dam,bdate)
mydf
# dam bdate
# 1 2A11 2009-10-01
# 2 2A11 2009-10-01
# 3 2A12 2009-10-01
# 4 2A12 2009-10-01
# 5 2A12 2009-10-01
# 6 4D23 2009-10-03
# 7 4D23 2009-10-03
# 8 1X23 2009-10-03
我使用了aggregate(dam ~ bdate, data=mydf, FUN=length)
,但它计算了在特定日期分娩的所有水坝
bdate dam
1 2009-10-01 5
2 2009-10-03 3
相反,我需要这样的东西:
mydf2
bdate dam
1 2009-10-01 2
2 2009-10-03 2
非常感谢您的帮助!
答案 0 :(得分:12)
怎么样:
aggregate(dam ~ bdate, data=mydf, FUN=function(x) length(unique(x)))
答案 1 :(得分:4)
您还可以先对数据运行unique
:
aggregate(dam ~ bdate, data=unique(mydf[c("dam","date")]), FUN=length)
然后您也可以使用table
代替aggregate
,但输出稍有不同。
> table(unique(mydf[c("dam","date")])$bdate)
2009-10-01 2009-10-03
2 2
答案 2 :(得分:3)
这只是如何思考问题的一个例子,以及如何解决问题的方法之一。
split.mydf <- with(mydf, split(x = mydf, f = bdate)) #each list element has only one date.
# it's just a matter of counting unique dams
unique.mydf <- lapply(X = split.mydf, FUN = unique)
#and then count the number of unique elements
unilen.mydf <- lapply(unique.mydf, length)
#you can do these two last steps in one go like so
lapply(split.mydf, FUN = function(x) length(unique(x)))
as.data.frame(unlist(unilen.mydf)) #data.frame is just a special list, so this is water to your mill
unlist(unilen.mydf)
2009-10-01 2
2009-10-03 2
答案 3 :(得分:0)
在dplyr中,您可以使用n_distinct
:
library(tidyverse)
mydf %>%
group_by(bdate) %>%
summarize(dam = n_distinct(dam))