我每个育种配对每天都有egg_number
的数据(Parents
)。我试图从下面的数据" egg_output1"中确定由父母分组的鸡蛋出现之间的平均时间(以天为单位)。
所以基本上是在按时间顺序排序数据之后按父项分组的行日期之间的平均差异。这可能吗??
Tank Parents date egg_number
1: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-03 2
2: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-03 1
3: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-05-23 1
我尝试使用以下代码:
as.Date(egg_output1$date)
egg <- egg_output1[order(egg_output1$date),]
ddply(
egg,
c("Parents"),
summarize,
average = mean(diff(date))
)
但是这会返回NA
并带有以下警告:
Warning messages:
1: In mean.default(diff(date)) : argument is not numeric or logical: returning NA
2: In mean.default(diff(date)) : argument is not numeric or logical: returning NA
3: In mean.default(diff(date)) : argument is not numeric or logical: returning NA
示例数据:
eggs <- data.frame(
parents = sample(c("005Fx001M", "008Fx006M","028Fx026M"), 10, replace = TRUE),
date = sample(seq(as.Date('2016/01/01'), as.Date('2017/01/01'), by="day"), 10),
egg_number = sample(c("1", "2"), 10, replace = TRUE))
答案 0 :(得分:0)
来自Calculating difference row values by group in R
> dt=NULL
> dt$Tank=rep("P3-T25",3)
> dt$Parents=rep("DON_AGEM_031FXDON_AGEM_036M",3)
> dt$Date=c("2017-06-3","2017-06-3","2017-05-3")
> dt$egg_number=c(2,1,1)
> dt=as.data.frame(dt)
> dt
Tank Parents Date egg_number
1 P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-3 2
2 P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-3 1
3 P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-05-3 1
library(data.table)
dt=data.table(dt)
setkey(dt,Parents)
library(lubridate)
> dt$Date=ymd(dt$Date)
> dt[,diff:=c(NA,diff(Date)),by=Parents]
> dt
Tank Parents Date egg_number diff
1: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-03 2 NA
2: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-06-03 1 0
3: P3-T25 DON_AGEM_031FXDON_AGEM_036M 2017-05-03 1 -31