按组查找运行最大值

时间:2015-12-03 15:14:10

标签: r max groupwise-maximum

我需要使用R按组查找变量的运行最大值。变量按组使用df[order(df$group, df$time),]按时间排序。

我的变量有一些NA,但我可以通过用零替换它来处理它。

这是数据框df的样子:

(df <- structure(list(var = c(5L, 2L, 3L, 4L, 0L, 3L, 6L, 4L, 8L, 4L),
               group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
                                 .Label = c("a", "b"), class = "factor"),
               time = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)),
          .Names = c("var", "group","time"),
          class = "data.frame", row.names = c(NA, -10L)))

#    var group time
# 1    5     a    1
# 2    2     a    2
# 3    3     a    3
# 4    4     a    4
# 5    0     a    5
# 6    3     b    1
# 7    6     b    2
# 8    4     b    3
# 9    8     b    4
# 10   4     b    5

我想要一个变量curMax:

var  |  group  |  time  |  curMax
5       a         1         5
2       a         2         5
3       a         3         5
4       a         4         5
0       a         5         5
3       b         1         3
6       b         2         6
4       b         3         6
8       b         4         8
4       b         5         8

如果你知道如何在R中实现它,请告诉我。

2 个答案:

答案 0 :(得分:6)

我们可以尝试data.table。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)),按&#39;分组&#39; ,我们得到了cummax&#39; var&#39;并将(:=)分配给一个新变量(&#39; curMax&#39;)

library(data.table)
setDT(df1)[, curMax := cummax(var), by = group]

正如@Michael Chirico评论的那样,如果时间&#39;数据不是order,我们可以在&#39; i&#39;

中做到这一点
setDT(df1)[order(time), curMax:=cummax(var), by = group]

dplyr

library(dplyr)
df1 %>% 
    group_by(group) %>%
    mutate(curMax = cummax(var)) 

如果df1 tbl_sql arrange可能需要明确排序,请使用df1 %>% group_by(group) %>% arrange(time, .by_group=TRUE) %>% mutate(curMax = cummax(var))

dbplyr::window_order

library(dbplyr) df1 %>% group_by(group) %>% window_order(time) %>% mutate(curMax = cummax(var))

from PIL import Image
in_put = raw_input("provide the image path: ")
pic_im = image.open(in_put)
pic_im.rotate(45).show()
pic_im.save("outputfile.jpg")

答案 1 :(得分:4)

你可以这样做:

df$curMax <- ave(df$var, df$group, FUN=cummax)