根据R

时间:2018-06-08 18:09:41

标签: r dataframe

我有一个复杂的问题,因此感谢您的耐心等待。

对于每个数据点,我想先从滞后和前导列中捕获值,然后在我的数据集的系列中识别这些值,并使用每个组的捕获值计算中值。

enter image description here

Group,Date,Month,Sales,lag,lead
Group1,42005,1,2503,0,2
Group1,42036,2,3734,0,2
Group1,42064,3,6631,2,3
Group1,42095,4,8606,0,0
Group1,42125,5,1889,0,2
Group1,42156,6,4819,1,2
Group1,42186,7,3294,1,0
Group1,42217,8,38999,2,0
Group1,42248,9,28372,1,0
Group1,42278,10,25396,4,1
Group1,42309,11,21093,1,0
Group2,42339,1,9263,0,3
Group2,42005,2,6660,1,3
Group2,42036,3,28595,2,2
Group2,42064,4,123,2,0
Group2,42095,5,11855,3,3
Group2,42125,6,15845,4,3
Group2,42156,7,32331,2,2
Group2,42186,8,3188,1,1
Group2,42217,9,38161,4,0

例如,如果我们查看Group1的第6个月,Sales值为4819,滞后和前导值分别为1和2。 我想首先捕获滞后和超前值,然后在系列中查看vlookup。对于4819,对于滞后(值= 1),我想直到4819以上的一个数据点(即1889,又名4819-> 1889,类似于铅(值= 2),我想要到两个数据点低于4891,即3294和38999.所以现在4819数据点的捕获点是1899,4819,3294和38999,因此现在我想取这个的中位数,并将其存储在我的输出中。这是我需要的运动为每个小组做。

类似地,对于第2组,第4个月,我想参考123捕获前2个滞后数据点(前导为零,因此不会被捕获),并取总值3的中值。 / p>

我尝试了一个ifelse条件的特定情况,看它是如何工作的。

df $ output< - ifelse(lag == 0& lead == 1,median(Sales,lead(Sales,1)),0)

结果非常令人惊讶。 R取了列的所有值的中值。另一个问题是,即使它有效,我也必须编写多个ifelse条件,因此寻找更简单的解决方案。

不确定如何处理问题并为R中的每个小组进行练习。

以下是我想要实现的输出。

Group,Date,Month,Sales,lag,lead,Output
Group1,42005,1,2503,0,2,3734
Group1,42036,2,3734,0,2,6631
Group1,42064,3,6631,2,3,4276.5
Group1,42095,4,8606,0,0,8606
Group1,42125,5,1889,0,2,3294
Group1,42156,6,4819,1,2,4056.5
Group1,42186,7,3294,1,0,4056.5
Group1,42217,8,38999,2,0,4819
Group1,42248,9,28372,1,0,33685.5
Group1,42278,10,25396,4,1,23244.5
Group1,42309,11,21093,1,0,23244.5
Group2,42339,1,9263,0,3,7961.5
Group2,42005,2,6660,1,3,9263
Group2,42036,3,28595,2,2,9263
Group2,42064,4,123,2,0,6660
Group2,42095,5,11855,3,3,11855
Group2,42125,6,15845,4,3,13850
Group2,42156,7,32331,2,2,15845
Group2,42186,8,3188,1,1,32331
Group2,42217,9,38161,4,0,15845

任何线索都会受到高度赞赏。

我错过了什么。请指导我如何解决这个问题。如果我需要使用任何功能,请帮助我。

谢谢,

2 个答案:

答案 0 :(得分:2)

df$Output <- sapply(seq(nrow(df)), # For each row (number) in df
                    function(i) 
                      # take the median of Sales from
                      # current row - current lag value
                      # to
                      # current row + current lead value
                      with(df, median(Sales[(i - lag[i]):(i + lead[i])]))) 

使用的数据:

df <- data.table::fread("
Group,Date,Month,Sales,lag,lead
Group1,42005,1,2503,0,2
Group1,42036,2,3734,0,2
Group1,42064,3,6631,2,3
Group1,42095,4,8606,0,0
Group1,42125,5,1889,0,2
Group1,42156,6,4819,1,2
Group1,42186,7,3294,1,0
Group1,42217,8,38999,2,0
Group1,42248,9,28372,1,0
Group1,42278,10,25396,4,1
Group1,42309,11,21093,1,0
Group2,42339,1,9263,0,3
Group2,42005,2,6660,1,3
Group2,42036,3,28595,2,2
Group2,42064,4,123,2,0
Group2,42095,5,11855,3,3
Group2,42125,6,15845,4,3
Group2,42156,7,32331,2,2
Group2,42186,8,3188,1,1
Group2,42217,9,38161,4,0
")


dout <- fread("
Group,Date,Month,Sales,lag,lead,Output
Group1,42005,1,2503,0,2,3734
Group1,42036,2,3734,0,2,6631
Group1,42064,3,6631,2,3,4276.5
Group1,42095,4,8606,0,0,8606
Group1,42125,5,1889,0,2,3294
Group1,42156,6,4819,1,2,4056.5
Group1,42186,7,3294,1,0,4056.5
Group1,42217,8,38999,2,0,4819
Group1,42248,9,28372,1,0,33685.5
Group1,42278,10,25396,4,1,23244.5
Group1,42309,11,21093,1,0,23244.5
Group2,42339,1,9263,0,3,7961.5
Group2,42005,2,6660,1,3,9263
Group2,42036,3,28595,2,2,9263
Group2,42064,4,123,2,0,6660
Group2,42095,5,11855,3,3,11855
Group2,42125,6,15845,4,3,13850
Group2,42156,7,32331,2,2,15845
Group2,42186,8,3188,1,1,32331
Group2,42217,9,38161,4,0,15845
")
all.equal(df$Output, dout$Output)
# [1] TRUE

答案 1 :(得分:1)

setDT(df)[,i:=sequence(.N)][,med:=as.numeric(median(df$Sales[c((i-lag):(i+lead))])),by=i][,i:=NULL][]
     Group  Date Month Sales lag lead     med
 1: Group1 42005     1  2503   0    2  3734.0
 2: Group1 42036     2  3734   0    2  6631.0
 3: Group1 42064     3  6631   2    3  4276.5
 4: Group1 42095     4  8606   0    0  8606.0
 5: Group1 42125     5  1889   0    2  3294.0
 6: Group1 42156     6  4819   1    2  4056.5
 7: Group1 42186     7  3294   1    0  4056.5
 8: Group1 42217     8 38999   2    0  4819.0
 9: Group1 42248     9 28372   1    0 33685.5
10: Group1 42278    10 25396   4    1 23244.5
11: Group1 42309    11 21093   1    0 23244.5
12: Group2 42339     1  9263   0    3  7961.5
13: Group2 42005     2  6660   1    3  9263.0
14: Group2 42036     3 28595   2    2  9263.0
15: Group2 42064     4   123   2    0  6660.0
16: Group2 42095     5 11855   3    3 11855.0
17: Group2 42125     6 15845   4    3 13850.0
18: Group2 42156     7 32331   2    2 15845.0
19: Group2 42186     8  3188   1    1 32331.0
20: Group2 42217     9 38161   4    0 15845.0