R - DataFrames和行操作

时间:2017-06-02 18:34:02

标签: r dataframe

假设我有下一个数据框。

table<-data.frame(group=c(0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40),plan=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),price=c(1,4,5,6,8,9,12,12,12,3,5,6,7,10,12,20,20,20,5,6,8,12,15,20,22,28,28))

   group plan price
1      0    1     1
2      5    1     4
3     10    1     5
4     15    1     6
5     20    1     8
6     25    1     9
7     30    1    12
8     35    1    12
9     40    1    12
10     0    2     3
11     5    2     5
12    10    2     6
13    15    2     7
14    20    2    10
15    25    2    12
16    30    2    20
17    35    2    20
18    40    2    20
19     0    3     5
20     5    3     6
21    10    3     8
22    15    3    12
23    20    3    15
24    25    3    20
25    30    3    22
26    35    3    28
27    40    3    28

所以,我想对列进行分组,以便为​​每个&#34;计划&#34;与&#34; group&#34;大于20,将我的2对2记录(下一条记录的平均值)分组,当重复最大数字时,保留后者不重复。

以下示例显示了结果如何。

data.frame(group=c(0,5,10,15,20,30,0,5,10,15,20,30,0,5,10,15,20,30,40),plan=c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3),price=c(1,4,5,6,8.5,12,3,5,6,7,11,20,5,6,8,12,17.5,25,28))

   group plan price
1      0    1   1.0
2      5    1   4.0
3     10    1   5.0
4     15    1   6.0
5     20    1   8.5
6     30    1  12.0
7      0    1   3.0
8      5    2   5.0
9     10    2   6.0
10    15    2   7.0
11    20    2  11.0
12    30    2  20.0
13     0    3   5.0
14     5    3   6.0
15    10    3   8.0
16    15    3  12.0
17    20    3  17.5
18    30    3  25.0
19    40    3  28.0

谢谢!

2 个答案:

答案 0 :(得分:2)

您可以使用dplyr包尝试此操作:

library(dplyr)
table %>% 
group_by(plan) %>%
mutate(group=ifelse(group<20,group,10*floor(group/10))) %>%
group_by(plan,group) %>% 
summarise(price=mean(price)) %>%
## Keep the last row per group only if the price is different from the previous average price
group_by(plan) %>%
filter(!(row_number()==n() & price==lag(price)))

返回:

    plan group price
   <dbl> <dbl> <dbl>
 1     1     0   1.0
 2     1     5   4.0
 3     1    10   5.0
 4     1    15   6.0
 5     1    20   8.5
 6     1    30  12.0
 7     2     0   3.0
 8     2     5   5.0
 9     2    10   6.0
10     2    15   7.0
11     2    20  11.0
12     2    30  20.0
13     3     0   5.0
14     3     5   6.0
15     3    10   8.0
16     3    15  12.0
17     3    20  17.5
18     3    30  25.0
19     3    40  28.0

答案 1 :(得分:1)

怎么样:

dat<-data.frame(group=c(0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40),plan=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),price=c(1,4,5,6,8,9,12,12,12,3,5,6,7,10,12,20,20,20,5,6,8,12,15,20,22,28,28))

s <- split(dat, ifelse(dat$group>20, ">20", "<=20"))
s20 <- s[[">20"]] # easier to read
tens <- which(s20$group %% 10 == 0)
tens
# [1]  2  4  6  8 10 12

subgroup <- rep(1:length(tens), each = nrow(s20)/length(tens)) # can handle different freqs
subgroup
# [1] 1 1 2 2 3 3 4 4 5 5 6 6

ToAddBack <- s20[tens,]
ToAddBack[,"price"] <- aggregate(s20$price, by = list(subgroup), mean)[2]

newdat <- rbind(s[["<=20"]], ToAddBack)
finaldat <- newdat[order(newdat$plan, newdat$group),]

你的finaldat与你的例子略有不同,因为我认为你偶然遗漏了一些行:

finaldat
   group plan price
1      0    1   1.0
2      5    1   4.0
3     10    1   5.0
4     15    1   6.0
5     20    1   8.0
7     30    1  10.5
9     40    1  12.0
10     0    2   3.0
11     5    2   5.0
12    10    2   6.0
13    15    2   7.0
14    20    2  10.0
16    30    2  16.0
18    40    2  20.0
19     0    3   5.0
20     5    3   6.0
21    10    3   8.0
22    15    3  12.0
23    20    3  15.0
25    30    3  21.0
27    40    3  28.0