我要生成列:“ PriorityCountInLast7Days”。对于给定的员工A,此列计算最近7天的案例数,其中PRIORITY与当前案例相同。我该如何在R中使用前4列?
data <- data.frame(
Date = c("2018-06-01", "2018-06-03", "2018-06-03", "2018-06-03", "2018-06-04", "2018-06-01", "2018-06-02", "2018-06-03"),
Emp1 = c("A","A","A","A","A","A","B","B","B"),
Case = c("A1", "A2", "A3", "A4", "A5", "A6", "B1", "B2", "B3"),
Priority = c(0,0,0,1,2,0,0,0,0),
PriorityCountinLast7days = c(0,1,2,1,1,3,1,2,3))
+------------+------+------+----------+--------------------------+
| Date | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A | A1 | 0 | 0 |
| 2018-06-03 | A | A2 | 0 | 1 |
| 2018-06-03 | A | A3 | 0 | 2 |
| 2018-06-03 | A | A4 | 1 | 1 |
| 2018-06-03 | A | A5 | 2 | 1 |
| 2018-06-04 | A | A6 | 0 | 3 |
| 2018-06-01 | B | B1 | 0 | 1 |
| 2018-06-02 | B | B2 | 0 | 2 |
| 2018-06-03 | B | B3 | 0 | 3 |
+------------+------+------+----------+--------------------------+
答案 0 :(得分:0)
您可以在整个数据集上使用迭代条件总和来完成此滚动窗口。这是什么意思?在for循环中,您可以检查当前日期> =要包含的日期,以及要包含> =的日期到7天前的日期,并且要包含的个案是==当前的个案。循环中的这种逻辑组合将为您创建此滚动过滤器。这是一个函数:
rollPriority <- function(data, window = 7){
stopifnot(all(c("Date","Case","Priority") %in% colnames(data))) # string error check
data$Date <- as.Date(data$Date)
for(i in 1:nrow(data)){
#priorxdays <= dates we want <= current date
datecheck <- (data$Date[i] - (window-1)) <= data$Date & data$Date <= data$Date[i]
casecheck <- data$Case == data$Case[i]
data$PriorityCountinLastXdays[i] = sum(data$Priority[which(datecheck & casecheck)])
}
Xdays <- which(colnames(data) == "PriorityCountinLastXdays")
colnames(data)[Xdays] <- paste0("PriorityCountinLast", window, "days")
return(data)
}
将来,请提供可重复输出的示例数据。您会注意到,我们仅看到4天的信息就无法满足您预期的7天滚动输出。一种快速的方法是使用expand.grid()
生成组合,并使用set.seed()
保留采样输出:
# Reproducible Example Data
dat <- expand.grid(Date = seq.Date(as.Date("2018-06-01"),
as.Date("2018-06-4"),
by = "day"),
Case = as.factor(sort(apply(expand.grid(c("A","B"),1:2),
1,
paste0,
collapse = ""))))
# Ensures random sampling is identical each time
set.seed(42);
dat$Priority <- sample(0:1, nrow(dat), replace = T)
# The function
rollPriority(dat, 2)
# Date Case Priority PriorityCountinLast2days
#1 2018-06-01 A1 1 1
#2 2018-06-02 A1 1 2
#3 2018-06-03 A1 0 1
#4 2018-06-04 A1 1 1
#5 2018-06-01 A2 1 1
#6 2018-06-02 A2 1 2
#7 2018-06-03 A2 1 2
#8 2018-06-04 A2 0 1
#9 2018-06-01 B1 1 1
#10 2018-06-02 B1 1 2
#11 2018-06-03 B1 0 1
#12 2018-06-04 B1 1 1
#13 2018-06-01 B2 1 1
#14 2018-06-02 B2 0 1
#15 2018-06-03 B2 0 0
#16 2018-06-04 B2 1 1
这样,某人更容易准确地为您提供帮助。