不使用连接并使用R

时间:2018-01-07 02:31:47

标签: r filter data.table percentile

我的代码的目标是在一个组定义的特定列上应用基于百分位数的截止值。

我在SO上找到了几个主题,例如:

很遗憾,这些主题要么不根据群组应用过滤器,要么不使用data.tablebase-R

我专门寻找没有加入的方法。基于Base R的方法很好,但我真的很喜欢基于data.table的方法,因为我有大量的数据。我能够做我想做的加入,但我正在寻找更好的方法,可能避免加入。

这是我的输入数据:

Input_File <- structure(list(Zone = c("East", "East", "East", "East", "East", 
"East", "East", "West", "West", "West", "West", "West", "West", 
"West"), Fiscal.Year = c(2016, 2016, 2016, 2016, 2016, 2016, 
2017, 2016, 2016, 2016, 2017, 2017, 2018, 2018), Transaction.ID = c(132, 
133, 134, 135, 136, 137, 171, 171, 172, 173, 175, 176, 177, 178
), L.Qty = c(3, 0, 0, 1, 0, 0, 1, 1, 1, 2, 2, 1, 2, 1), A.Qty = c(0, 
0, 0, 2, 2, 3, 0, 0, 0, 0, 0, 3, 0, 0), I.Qty = c(2, 2, 2, 0, 
1, 0, 3, 0, 0, 0, 1, 0, 1, 1)), .Names = c("Zone", "Fiscal.Year", 
"Transaction.ID", "L.Qty", "A.Qty", "I.Qty"), row.names = c(NA, 
-14L), class = "data.frame")

这是我的代码(使用加入):

  Input_File <- data.table::as.data.table(Input_File)
  Q <- data.table::as.data.table(data.frame(Zone=c("East","West"), Ten_percentile=c(2017,2018)))
  O <- Q[Input_File,on=c("Zone")] [Fiscal.Year>=Ten_percentile]

关于我的代码的简要说明:我在Ten_percentile Fiscal.Year上应用了Zone截止值。

这是截止表:

 Q
   Zone Ten_percentile
1: East           2017
2: West           2018

这是预期的输出:

O
   Zone Ten_percentile Fiscal.Year Transaction.ID L.Qty A.Qty I.Qty
1: East           2017        2017            171     1     0     3
2: West           2018        2018            177     2     0     1
3: West           2018        2018            178     1     0     1

dput格式

输出
structure(list(Zone = structure(c(1L,2L,2L),
  .Label = c("East","West"), class = "factor"),
  Ten_percentile = c(2017,2018,2018),
  Fiscal.Year = c(2017,2018,2018),
  Transaction.ID = c(171,177,178), L.Qty = c(1,2,1),
  A.Qty = c(0,0,0), I.Qty = c(3,1,1)),
  .Names = c("Zone","Ten_percentile","Fiscal.Year","Transaction.ID", 
  "L.Qty","A.Qty","I.Qty"), class = "data.frame", row.names = c(NA, 
-3L))

提前感谢您对我的任何帮助。我是data.table的忠实粉丝。因此,我想学习解决同一问题的不同方法,并且精通data.tablebase-R

1 个答案:

答案 0 :(得分:1)

我们可以进行非等连接

res <- as.data.table(Input_File)[Q, c(.SD, list(Ten_percentile = Ten_percentile)),
                 on = .(Zone, Fiscal.Year >= Ten_percentile)]