Question

我有一个data.frame，其中包含4个变量： day （日期，格式：“ YYYY-MM-DD”）， hour （POSIXct，格式：“ YYYY” -MM-DD hh：mm：ss“），部门（chr）和金额（数字）。

          day                hour department amount max_cond
1  2019-08-08 2019-08-08 11:45:00       DPT1      2        3
2  2019-08-08 2019-08-08 12:00:00       DPT1      3        3
3  2019-08-08 2019-08-08 12:15:00       DPT1      3        3
4  2019-08-08 2019-08-08 12:30:00       DPT1      2        2
5  2019-08-08 2019-08-08 12:45:00       DPT1      0        2
6  2019-08-08 2019-08-08 13:00:00       DPT1      0        2
7  2019-08-08 2019-08-08 13:15:00       DPT1      1        2
8  2019-08-08 2019-08-08 13:30:00       DPT1      2        2
9  2019-08-08 2019-08-08 13:45:00       DPT1      1        1
10 2019-08-08 2019-08-08 11:45:00       DPT2      3        3
11 2019-08-08 2019-08-08 12:00:00       DPT2      3        3
12 2019-08-08 2019-08-08 12:15:00       DPT2      3        3
13 2019-08-08 2019-08-08 12:30:00       DPT2      2        3
14 2019-08-08 2019-08-08 12:45:00       DPT2      2        3
15 2019-08-08 2019-08-08 13:00:00       DPT2      3        3
16 2019-08-08 2019-08-08 13:15:00       DPT2      0        0
17 2019-08-08 2019-08-08 13:30:00       DPT2      0        0
18 2019-08-08 2019-08-08 13:45:00       DPT2      0        0

对于data.frame的每一行，我想要获取金额的最大值，该值按天和部门分组，但仅一天中大于或等于相应行的小时的小时。

换句话说，对于每个观察值[ day_i，hour_i，department_i ]，我想要得到：max（ amount |（ day = = day_i ）＆（部门 == department_i ）＆（小时> = hour_i ））。

对于上面的示例，我们应该有：

.container:before{display:table;content:" "}

Answer 1

非常相似，但是可以使用data.table：

library(data.table)

df <- structure(list(
  day = structure(c(18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116), class = "Date"), 
  hour = structure(c(1565275500, 1565276400, 1565277300, 1565278200, 1565279100, 1565280000, 1565280900, 1565281800, 1565282700, 1565275500, 1565276400, 1565277300, 1565278200, 1565279100, 1565280000, 1565280900, 1565281800, 1565282700), class = c("POSIXct", "POSIXt"), tzone = ""), 
  department = c("DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2"), 
  amount = c(2, 3, 3, 2, 0, 0, 1, 2, 1, 3, 3, 3, 2, 2, 3, 0, 0, 0), max_cond = c(3, 3, 3, 2, 2, 2, 2, 2, 1, 3, 3, 3, 3, 3, 3, 0, 0, 0)), row.names = c(NA, -18L), class = "data.frame")

dt = data.table(df)
setorder(dt, -hour)
dt[,max_cond_new:=cummax(amount),by=.(day,department)]
setorder(dt, department, hour)

希望这会有所帮助！

Answer 2

一种base R方法：您可以使用cummax()（暨最终 max 最大）来解决此问题。 请注意，我假设您的数据框已对hour进行了排序，在您的示例中就是这种情况。

这个想法是：首先将数据帧split()分成具有不同的date和department的组件。然后，在每个组件中：

反转相关向量$day
用$max_cond构造cummax()变量（相反）
将$max_cond变量翻转回正确的顺序

然后，将所有组件与do.call()和rbind()粘在一起。

以您的示例为例：

df2 <- split(df, list(df$department, df$day))
df2 <- lapply(df2, function(x) {
  x$max_cond <- x[order(x$hour, decreasing = T), ]$amount %>%
    cummax %>%
    sort(decreasing = T)
  x
})

df2 <- do.call(rbind, df2)
row.names(df2) <- NULL

df2
##           day                hour department amount max_cond
## 1  2019-08-08 2019-08-08 10:45:00       DPT1      2        3
## 2  2019-08-08 2019-08-08 11:00:00       DPT1      3        3
## 3  2019-08-08 2019-08-08 11:15:00       DPT1      3        3
## 4  2019-08-08 2019-08-08 11:30:00       DPT1      2        2
## 5  2019-08-08 2019-08-08 11:45:00       DPT1      0        2
## 6  2019-08-08 2019-08-08 12:00:00       DPT1      0        2
## 7  2019-08-08 2019-08-08 12:15:00       DPT1      1        2
## 8  2019-08-08 2019-08-08 12:30:00       DPT1      2        2
## 9  2019-08-08 2019-08-08 12:45:00       DPT1      1        1
## 10 2019-08-08 2019-08-08 10:45:00       DPT2      3        3
## 11 2019-08-08 2019-08-08 11:00:00       DPT2      3        3
## 12 2019-08-08 2019-08-08 11:15:00       DPT2      3        3
## 13 2019-08-08 2019-08-08 11:30:00       DPT2      2        3
## 14 2019-08-08 2019-08-08 11:45:00       DPT2      2        3
## 15 2019-08-08 2019-08-08 12:00:00       DPT2      3        3
## 16 2019-08-08 2019-08-08 12:15:00       DPT2      0        0
## 17 2019-08-08 2019-08-08 12:30:00       DPT2      0        0
## 18 2019-08-08 2019-08-08 12:45:00       DPT2      0        0

通过附加约束按组获取最大值

2 个答案: