这个问题部分与先前的问题here有关。我想基于三列聚合计数,并且具有由date,id和rdate这三个变量定义的组的最后事件计数。我希望拥有的内容如下:
date rdate event
1 01-jan-90 08-jan-90 3
2 01-jan-90 15-jan-90 3
3 01-jan-90 01-jan-90 3
4 01-jan-90 22-jan-90 3
5 01-jan-90 29-jan-90 3
1.1 01-jan-90 08-jan-90 2
2.1 01-jan-90 15-jan-90 2
3.1 01-jan-90 01-jan-90 2
4.1 01-jan-90 22-jan-90 2
5.1 01-jan-90 29-jan-90 2
我尝试过这段代码,但这只能获得小组的意思
aa<-aggregate(event ~ id+rdate+date,data = mydf,FUN=mean)
示例数据如下:
structure(list(date = c("01-jan-90", "01-jan-90", "01-jan-90",
"01-jan-90", "01-jan-90", "01-jan-90", "01-jan-90", "01-jan-90",
"01-jan-90", "01-jan-90", "01-jan-90", "01-jan-90", "01-jan-90",
"01-jan-90", "01-jan-90", "02-jan-90", "02-jan-90", "02-jan-90",
"02-jan-90", "02-jan-90", "02-jan-90", "02-jan-90", "02-jan-90",
"02-jan-90", "02-jan-90"), id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), rdate = c("08-jan-90", "15-jan-90", "01-jan-90", "22-jan-90",
"29-jan-90", "08-jan-90", "15-jan-90", "01-jan-90", "22-jan-90",
"29-jan-90", "08-jan-90", "15-jan-90", "01-jan-90", "22-jan-90",
"29-jan-90", "09-jan-90", "16-jan-90", "02-jan-90", "23-jan-90",
"30-jan-90", "09-jan-90", "16-jan-90", "02-jan-90", "23-jan-90",
"30-jan-90"), event = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L)), .Names = c("date",
"id", "rdate", "event"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "1.1", "2.1", "3.1", "4.1", "5.1", "1.2",
"2.2", "3.2", "4.2", "5.2", "6", "7", "8", "9", "10", "6.1",
"7.1", "8.1", "9.1", "10.1"))
答案 0 :(得分:1)
我认为这就是你所追求的目标:
> ddply(d, .(id, date, rdate), summarise, event = tail(event, 1))
id date rdate event
1 1 01-jan-90 01-jan-90 3
2 1 01-jan-90 08-jan-90 3
3 1 01-jan-90 15-jan-90 3
4 1 01-jan-90 22-jan-90 3
5 1 01-jan-90 29-jan-90 3
6 2 02-jan-90 02-jan-90 2
7 2 02-jan-90 09-jan-90 2
8 2 02-jan-90 16-jan-90 2
9 2 02-jan-90 23-jan-90 2
10 2 02-jan-90 30-jan-90 2
如果订单很重要,您可以按日期排序并按日期排序。
答案 1 :(得分:0)
不完全确定你要做的是什么,但这样的事情呢?
library(plyr)
ddply(mydf, .(id, date, rdate), summarise,
date = tail(date, 1),
id = tail(id, 1),
rdate = tail(rdate, 1),
mean = mean(event))
)
输出:
> library(plyr)
> ddply(mydf, .(id, date, rdate), summarise,
+ date = tail(date, 1),
+ id = tail(id, 1),
+ rdate = tail(rdate, 1),
+ mean = mean(event))
date id rdate mean
1 01-jan-90 1 01-jan-90 2.0
2 01-jan-90 1 08-jan-90 2.0
3 01-jan-90 1 15-jan-90 2.0
4 01-jan-90 1 22-jan-90 2.0
5 01-jan-90 1 29-jan-90 2.0
6 02-jan-90 2 02-jan-90 1.5
7 02-jan-90 2 09-jan-90 1.5
8 02-jan-90 2 16-jan-90 1.5
9 02-jan-90 2 23-jan-90 1.5
10 02-jan-90 2 30-jan-90 1.5
>