我有一个复杂的问题,因此感谢您的耐心等待。
对于每个数据点,我想先从滞后和前导列中捕获值,然后在我的数据集的系列中识别这些值,并使用每个组的捕获值计算中值。
Group,Date,Month,Sales,lag,lead
Group1,42005,1,2503,0,2
Group1,42036,2,3734,0,2
Group1,42064,3,6631,2,3
Group1,42095,4,8606,0,0
Group1,42125,5,1889,0,2
Group1,42156,6,4819,1,2
Group1,42186,7,3294,1,0
Group1,42217,8,38999,2,0
Group1,42248,9,28372,1,0
Group1,42278,10,25396,4,1
Group1,42309,11,21093,1,0
Group2,42339,1,9263,0,3
Group2,42005,2,6660,1,3
Group2,42036,3,28595,2,2
Group2,42064,4,123,2,0
Group2,42095,5,11855,3,3
Group2,42125,6,15845,4,3
Group2,42156,7,32331,2,2
Group2,42186,8,3188,1,1
Group2,42217,9,38161,4,0
例如,如果我们查看Group1的第6个月,Sales值为4819,滞后和前导值分别为1和2。 我想首先捕获滞后和超前值,然后在系列中查看vlookup。对于4819,对于滞后(值= 1),我想直到4819以上的一个数据点(即1889,又名4819-> 1889,类似于铅(值= 2),我想要到两个数据点低于4891,即3294和38999.所以现在4819数据点的捕获点是1899,4819,3294和38999,因此现在我想取这个的中位数,并将其存储在我的输出中。这是我需要的运动为每个小组做。
类似地,对于第2组,第4个月,我想参考123捕获前2个滞后数据点(前导为零,因此不会被捕获),并取总值3的中值。 / p>
我尝试了一个ifelse条件的特定情况,看它是如何工作的。
df $ output< - ifelse(lag == 0& lead == 1,median(Sales,lead(Sales,1)),0)
结果非常令人惊讶。 R取了列的所有值的中值。另一个问题是,即使它有效,我也必须编写多个ifelse条件,因此寻找更简单的解决方案。
不确定如何处理问题并为R中的每个小组进行练习。
以下是我想要实现的输出。
Group,Date,Month,Sales,lag,lead,Output
Group1,42005,1,2503,0,2,3734
Group1,42036,2,3734,0,2,6631
Group1,42064,3,6631,2,3,4276.5
Group1,42095,4,8606,0,0,8606
Group1,42125,5,1889,0,2,3294
Group1,42156,6,4819,1,2,4056.5
Group1,42186,7,3294,1,0,4056.5
Group1,42217,8,38999,2,0,4819
Group1,42248,9,28372,1,0,33685.5
Group1,42278,10,25396,4,1,23244.5
Group1,42309,11,21093,1,0,23244.5
Group2,42339,1,9263,0,3,7961.5
Group2,42005,2,6660,1,3,9263
Group2,42036,3,28595,2,2,9263
Group2,42064,4,123,2,0,6660
Group2,42095,5,11855,3,3,11855
Group2,42125,6,15845,4,3,13850
Group2,42156,7,32331,2,2,15845
Group2,42186,8,3188,1,1,32331
Group2,42217,9,38161,4,0,15845
任何线索都会受到高度赞赏。
我错过了什么。请指导我如何解决这个问题。如果我需要使用任何功能,请帮助我。
谢谢,
答案 0 :(得分:2)
df$Output <- sapply(seq(nrow(df)), # For each row (number) in df
function(i)
# take the median of Sales from
# current row - current lag value
# to
# current row + current lead value
with(df, median(Sales[(i - lag[i]):(i + lead[i])])))
使用的数据:
df <- data.table::fread("
Group,Date,Month,Sales,lag,lead
Group1,42005,1,2503,0,2
Group1,42036,2,3734,0,2
Group1,42064,3,6631,2,3
Group1,42095,4,8606,0,0
Group1,42125,5,1889,0,2
Group1,42156,6,4819,1,2
Group1,42186,7,3294,1,0
Group1,42217,8,38999,2,0
Group1,42248,9,28372,1,0
Group1,42278,10,25396,4,1
Group1,42309,11,21093,1,0
Group2,42339,1,9263,0,3
Group2,42005,2,6660,1,3
Group2,42036,3,28595,2,2
Group2,42064,4,123,2,0
Group2,42095,5,11855,3,3
Group2,42125,6,15845,4,3
Group2,42156,7,32331,2,2
Group2,42186,8,3188,1,1
Group2,42217,9,38161,4,0
")
dout <- fread("
Group,Date,Month,Sales,lag,lead,Output
Group1,42005,1,2503,0,2,3734
Group1,42036,2,3734,0,2,6631
Group1,42064,3,6631,2,3,4276.5
Group1,42095,4,8606,0,0,8606
Group1,42125,5,1889,0,2,3294
Group1,42156,6,4819,1,2,4056.5
Group1,42186,7,3294,1,0,4056.5
Group1,42217,8,38999,2,0,4819
Group1,42248,9,28372,1,0,33685.5
Group1,42278,10,25396,4,1,23244.5
Group1,42309,11,21093,1,0,23244.5
Group2,42339,1,9263,0,3,7961.5
Group2,42005,2,6660,1,3,9263
Group2,42036,3,28595,2,2,9263
Group2,42064,4,123,2,0,6660
Group2,42095,5,11855,3,3,11855
Group2,42125,6,15845,4,3,13850
Group2,42156,7,32331,2,2,15845
Group2,42186,8,3188,1,1,32331
Group2,42217,9,38161,4,0,15845
")
all.equal(df$Output, dout$Output)
# [1] TRUE
答案 1 :(得分:1)
setDT(df)[,i:=sequence(.N)][,med:=as.numeric(median(df$Sales[c((i-lag):(i+lead))])),by=i][,i:=NULL][]
Group Date Month Sales lag lead med
1: Group1 42005 1 2503 0 2 3734.0
2: Group1 42036 2 3734 0 2 6631.0
3: Group1 42064 3 6631 2 3 4276.5
4: Group1 42095 4 8606 0 0 8606.0
5: Group1 42125 5 1889 0 2 3294.0
6: Group1 42156 6 4819 1 2 4056.5
7: Group1 42186 7 3294 1 0 4056.5
8: Group1 42217 8 38999 2 0 4819.0
9: Group1 42248 9 28372 1 0 33685.5
10: Group1 42278 10 25396 4 1 23244.5
11: Group1 42309 11 21093 1 0 23244.5
12: Group2 42339 1 9263 0 3 7961.5
13: Group2 42005 2 6660 1 3 9263.0
14: Group2 42036 3 28595 2 2 9263.0
15: Group2 42064 4 123 2 0 6660.0
16: Group2 42095 5 11855 3 3 11855.0
17: Group2 42125 6 15845 4 3 13850.0
18: Group2 42156 7 32331 2 2 15845.0
19: Group2 42186 8 3188 1 1 32331.0
20: Group2 42217 9 38161 4 0 15845.0