如何获得分组事件频率?

时间:2017-06-15 22:25:26

标签: r

我每个育种配对每天都有egg_number的数据(Parents)。我试图从下面的数据" egg_output1"中确定由父母分组的鸡蛋出现之间的平均时间(以天为单位)。

所以基本上是在按时间顺序排序数据之后按父项分组的行日期之间的平均差异。这可能吗??

   Tank     Parents                         date            egg_number
1: P3-T25   DON_AGEM_031FXDON_AGEM_036M     2017-06-03      2
2: P3-T25   DON_AGEM_031FXDON_AGEM_036M     2017-06-03      1
3: P3-T25   DON_AGEM_031FXDON_AGEM_036M     2017-05-23      1

我尝试使用以下代码:

as.Date(egg_output1$date)
egg <- egg_output1[order(egg_output1$date),]
ddply(
  egg, 
  c("Parents"), 
  summarize,
  average = mean(diff(date))
)

但是这会返回NA并带有以下警告:

Warning messages:
1: In mean.default(diff(date)) : argument is not numeric or logical: returning NA
2: In mean.default(diff(date)) : argument is not numeric or logical: returning NA
3: In mean.default(diff(date)) : argument is not numeric or logical: returning NA

示例数据:

eggs <- data.frame(
  parents = sample(c("005Fx001M", "008Fx006M","028Fx026M"), 10, replace = TRUE),
  date = sample(seq(as.Date('2016/01/01'), as.Date('2017/01/01'), by="day"), 10),
  egg_number = sample(c("1", "2"), 10, replace = TRUE))

1 个答案:

答案 0 :(得分:0)

来自Calculating difference row values by group in R

> dt=NULL
> dt$Tank=rep("P3-T25",3)
> dt$Parents=rep("DON_AGEM_031FXDON_AGEM_036M",3)
> dt$Date=c("2017-06-3","2017-06-3","2017-05-3")
> dt$egg_number=c(2,1,1)
> dt=as.data.frame(dt)
> dt
    Tank                     Parents      Date egg_number
1 P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-3          2
2 P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-3          1
3 P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-05-3          1


library(data.table)

dt=data.table(dt)
setkey(dt,Parents)
library(lubridate)
   > dt$Date=ymd(dt$Date)
> dt[,diff:=c(NA,diff(Date)),by=Parents]
> dt
     Tank                     Parents       Date egg_number diff
1: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-03          2   NA
2: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-03          1    0
3: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-05-03          1  -31