我需要使用R按组查找变量的运行最大值。变量按组使用df[order(df$group, df$time),]
按时间排序。
我的变量有一些NA,但我可以通过用零替换它来处理它。
这是数据框df的样子:
(df <- structure(list(var = c(5L, 2L, 3L, 4L, 0L, 3L, 6L, 4L, 8L, 4L),
group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
.Label = c("a", "b"), class = "factor"),
time = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)),
.Names = c("var", "group","time"),
class = "data.frame", row.names = c(NA, -10L)))
# var group time
# 1 5 a 1
# 2 2 a 2
# 3 3 a 3
# 4 4 a 4
# 5 0 a 5
# 6 3 b 1
# 7 6 b 2
# 8 4 b 3
# 9 8 b 4
# 10 4 b 5
我想要一个变量curMax:
var | group | time | curMax
5 a 1 5
2 a 2 5
3 a 3 5
4 a 4 5
0 a 5 5
3 b 1 3
6 b 2 6
4 b 3 6
8 b 4 8
4 b 5 8
如果你知道如何在R中实现它,请告诉我。
答案 0 :(得分:6)
我们可以尝试data.table
。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)
),按&#39;分组&#39; ,我们得到了cummax
&#39; var&#39;并将(:=
)分配给一个新变量(&#39; curMax&#39;)
library(data.table)
setDT(df1)[, curMax := cummax(var), by = group]
正如@Michael Chirico评论的那样,如果时间&#39;数据不是order
,我们可以在&#39; i&#39;
setDT(df1)[order(time), curMax:=cummax(var), by = group]
或dplyr
library(dplyr)
df1 %>%
group_by(group) %>%
mutate(curMax = cummax(var))
如果df1
tbl_sql
arrange
可能需要明确排序,请使用df1 %>%
group_by(group) %>%
arrange(time, .by_group=TRUE) %>%
mutate(curMax = cummax(var))
dbplyr::window_order
或library(dbplyr)
df1 %>%
group_by(group) %>%
window_order(time) %>%
mutate(curMax = cummax(var))
from PIL import Image
in_put = raw_input("provide the image path: ")
pic_im = image.open(in_put)
pic_im.rotate(45).show()
pic_im.save("outputfile.jpg")
答案 1 :(得分:4)
你可以这样做:
df$curMax <- ave(df$var, df$group, FUN=cummax)