假设我有下一个数据框。
table<-data.frame(group=c(0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40),plan=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),price=c(1,4,5,6,8,9,12,12,12,3,5,6,7,10,12,20,20,20,5,6,8,12,15,20,22,28,28))
group plan price
1 0 1 1
2 5 1 4
3 10 1 5
4 15 1 6
5 20 1 8
6 25 1 9
7 30 1 12
8 35 1 12
9 40 1 12
10 0 2 3
11 5 2 5
12 10 2 6
13 15 2 7
14 20 2 10
15 25 2 12
16 30 2 20
17 35 2 20
18 40 2 20
19 0 3 5
20 5 3 6
21 10 3 8
22 15 3 12
23 20 3 15
24 25 3 20
25 30 3 22
26 35 3 28
27 40 3 28
所以,我想对列进行分组,以便为每个&#34;计划&#34;与&#34; group&#34;大于20,将我的2对2记录(下一条记录的平均值)分组,当重复最大数字时,保留后者不重复。
以下示例显示了结果如何。
data.frame(group=c(0,5,10,15,20,30,0,5,10,15,20,30,0,5,10,15,20,30,40),plan=c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3),price=c(1,4,5,6,8.5,12,3,5,6,7,11,20,5,6,8,12,17.5,25,28))
group plan price
1 0 1 1.0
2 5 1 4.0
3 10 1 5.0
4 15 1 6.0
5 20 1 8.5
6 30 1 12.0
7 0 1 3.0
8 5 2 5.0
9 10 2 6.0
10 15 2 7.0
11 20 2 11.0
12 30 2 20.0
13 0 3 5.0
14 5 3 6.0
15 10 3 8.0
16 15 3 12.0
17 20 3 17.5
18 30 3 25.0
19 40 3 28.0
谢谢!
答案 0 :(得分:2)
您可以使用dplyr
包尝试此操作:
library(dplyr)
table %>%
group_by(plan) %>%
mutate(group=ifelse(group<20,group,10*floor(group/10))) %>%
group_by(plan,group) %>%
summarise(price=mean(price)) %>%
## Keep the last row per group only if the price is different from the previous average price
group_by(plan) %>%
filter(!(row_number()==n() & price==lag(price)))
返回:
plan group price
<dbl> <dbl> <dbl>
1 1 0 1.0
2 1 5 4.0
3 1 10 5.0
4 1 15 6.0
5 1 20 8.5
6 1 30 12.0
7 2 0 3.0
8 2 5 5.0
9 2 10 6.0
10 2 15 7.0
11 2 20 11.0
12 2 30 20.0
13 3 0 5.0
14 3 5 6.0
15 3 10 8.0
16 3 15 12.0
17 3 20 17.5
18 3 30 25.0
19 3 40 28.0
答案 1 :(得分:1)
怎么样:
dat<-data.frame(group=c(0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40),plan=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),price=c(1,4,5,6,8,9,12,12,12,3,5,6,7,10,12,20,20,20,5,6,8,12,15,20,22,28,28))
s <- split(dat, ifelse(dat$group>20, ">20", "<=20"))
s20 <- s[[">20"]] # easier to read
tens <- which(s20$group %% 10 == 0)
tens
# [1] 2 4 6 8 10 12
subgroup <- rep(1:length(tens), each = nrow(s20)/length(tens)) # can handle different freqs
subgroup
# [1] 1 1 2 2 3 3 4 4 5 5 6 6
ToAddBack <- s20[tens,]
ToAddBack[,"price"] <- aggregate(s20$price, by = list(subgroup), mean)[2]
newdat <- rbind(s[["<=20"]], ToAddBack)
finaldat <- newdat[order(newdat$plan, newdat$group),]
你的finaldat与你的例子略有不同,因为我认为你偶然遗漏了一些行:
finaldat
group plan price
1 0 1 1.0
2 5 1 4.0
3 10 1 5.0
4 15 1 6.0
5 20 1 8.0
7 30 1 10.5
9 40 1 12.0
10 0 2 3.0
11 5 2 5.0
12 10 2 6.0
13 15 2 7.0
14 20 2 10.0
16 30 2 16.0
18 40 2 20.0
19 0 3 5.0
20 5 3 6.0
21 10 3 8.0
22 15 3 12.0
23 20 3 15.0
25 30 3 21.0
27 40 3 28.0