我有一个data.frame,其中包含4个变量: day (日期,格式:“ YYYY-MM-DD”), hour (POSIXct,格式:“ YYYY” -MM-DD hh:mm:ss“),部门(chr)和金额(数字)。
day hour department amount max_cond
1 2019-08-08 2019-08-08 11:45:00 DPT1 2 3
2 2019-08-08 2019-08-08 12:00:00 DPT1 3 3
3 2019-08-08 2019-08-08 12:15:00 DPT1 3 3
4 2019-08-08 2019-08-08 12:30:00 DPT1 2 2
5 2019-08-08 2019-08-08 12:45:00 DPT1 0 2
6 2019-08-08 2019-08-08 13:00:00 DPT1 0 2
7 2019-08-08 2019-08-08 13:15:00 DPT1 1 2
8 2019-08-08 2019-08-08 13:30:00 DPT1 2 2
9 2019-08-08 2019-08-08 13:45:00 DPT1 1 1
10 2019-08-08 2019-08-08 11:45:00 DPT2 3 3
11 2019-08-08 2019-08-08 12:00:00 DPT2 3 3
12 2019-08-08 2019-08-08 12:15:00 DPT2 3 3
13 2019-08-08 2019-08-08 12:30:00 DPT2 2 3
14 2019-08-08 2019-08-08 12:45:00 DPT2 2 3
15 2019-08-08 2019-08-08 13:00:00 DPT2 3 3
16 2019-08-08 2019-08-08 13:15:00 DPT2 0 0
17 2019-08-08 2019-08-08 13:30:00 DPT2 0 0
18 2019-08-08 2019-08-08 13:45:00 DPT2 0 0
对于data.frame的每一行,我想要获取金额的最大值,该值按天和部门分组,但仅一天中大于或等于相应行的小时的小时。
换句话说,对于每个观察值[ day_i,hour_i,department_i ],我想要得到:max( amount |( day = = day_i )&(部门 == department_i )&(小时> = hour_i ))。
对于上面的示例,我们应该有:
.container:before{display:table;content:" "}
答案 0 :(得分:2)
非常相似,但是可以使用data.table
:
library(data.table)
df <- structure(list(
day = structure(c(18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116), class = "Date"),
hour = structure(c(1565275500, 1565276400, 1565277300, 1565278200, 1565279100, 1565280000, 1565280900, 1565281800, 1565282700, 1565275500, 1565276400, 1565277300, 1565278200, 1565279100, 1565280000, 1565280900, 1565281800, 1565282700), class = c("POSIXct", "POSIXt"), tzone = ""),
department = c("DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2"),
amount = c(2, 3, 3, 2, 0, 0, 1, 2, 1, 3, 3, 3, 2, 2, 3, 0, 0, 0), max_cond = c(3, 3, 3, 2, 2, 2, 2, 2, 1, 3, 3, 3, 3, 3, 3, 0, 0, 0)), row.names = c(NA, -18L), class = "data.frame")
dt = data.table(df)
setorder(dt, -hour)
dt[,max_cond_new:=cummax(amount),by=.(day,department)]
setorder(dt, department, hour)
希望这会有所帮助!
答案 1 :(得分:0)
一种base
R方法:您可以使用cummax()
(暨最终 max 最大)来解决此问题。 请注意,我假设您的数据框已对hour
进行了排序,在您的示例中就是这种情况。
这个想法是:首先将数据帧split()
分成具有不同的date
和department
的组件。然后,在每个组件中:
$day
$max_cond
构造cummax()
变量(相反)$max_cond
变量翻转回正确的顺序然后,将所有组件与do.call()
和rbind()
粘在一起。
以您的示例为例:
df2 <- split(df, list(df$department, df$day))
df2 <- lapply(df2, function(x) {
x$max_cond <- x[order(x$hour, decreasing = T), ]$amount %>%
cummax %>%
sort(decreasing = T)
x
})
df2 <- do.call(rbind, df2)
row.names(df2) <- NULL
df2
## day hour department amount max_cond
## 1 2019-08-08 2019-08-08 10:45:00 DPT1 2 3
## 2 2019-08-08 2019-08-08 11:00:00 DPT1 3 3
## 3 2019-08-08 2019-08-08 11:15:00 DPT1 3 3
## 4 2019-08-08 2019-08-08 11:30:00 DPT1 2 2
## 5 2019-08-08 2019-08-08 11:45:00 DPT1 0 2
## 6 2019-08-08 2019-08-08 12:00:00 DPT1 0 2
## 7 2019-08-08 2019-08-08 12:15:00 DPT1 1 2
## 8 2019-08-08 2019-08-08 12:30:00 DPT1 2 2
## 9 2019-08-08 2019-08-08 12:45:00 DPT1 1 1
## 10 2019-08-08 2019-08-08 10:45:00 DPT2 3 3
## 11 2019-08-08 2019-08-08 11:00:00 DPT2 3 3
## 12 2019-08-08 2019-08-08 11:15:00 DPT2 3 3
## 13 2019-08-08 2019-08-08 11:30:00 DPT2 2 3
## 14 2019-08-08 2019-08-08 11:45:00 DPT2 2 3
## 15 2019-08-08 2019-08-08 12:00:00 DPT2 3 3
## 16 2019-08-08 2019-08-08 12:15:00 DPT2 0 0
## 17 2019-08-08 2019-08-08 12:30:00 DPT2 0 0
## 18 2019-08-08 2019-08-08 12:45:00 DPT2 0 0