我的代码的目标是在一个组定义的特定列上应用基于百分位数的截止值。
我在SO上找到了几个主题,例如:
很遗憾,这些主题要么不根据群组应用过滤器,要么不使用data.table
或base-R
我专门寻找没有加入的方法。基于Base R
的方法很好,但我真的很喜欢基于data.table
的方法,因为我有大量的数据。我能够做我想做的加入,但我正在寻找更好的方法,可能避免加入。
这是我的输入数据:
Input_File <- structure(list(Zone = c("East", "East", "East", "East", "East",
"East", "East", "West", "West", "West", "West", "West", "West",
"West"), Fiscal.Year = c(2016, 2016, 2016, 2016, 2016, 2016,
2017, 2016, 2016, 2016, 2017, 2017, 2018, 2018), Transaction.ID = c(132,
133, 134, 135, 136, 137, 171, 171, 172, 173, 175, 176, 177, 178
), L.Qty = c(3, 0, 0, 1, 0, 0, 1, 1, 1, 2, 2, 1, 2, 1), A.Qty = c(0,
0, 0, 2, 2, 3, 0, 0, 0, 0, 0, 3, 0, 0), I.Qty = c(2, 2, 2, 0,
1, 0, 3, 0, 0, 0, 1, 0, 1, 1)), .Names = c("Zone", "Fiscal.Year",
"Transaction.ID", "L.Qty", "A.Qty", "I.Qty"), row.names = c(NA,
-14L), class = "data.frame")
这是我的代码(使用加入):
Input_File <- data.table::as.data.table(Input_File)
Q <- data.table::as.data.table(data.frame(Zone=c("East","West"), Ten_percentile=c(2017,2018)))
O <- Q[Input_File,on=c("Zone")] [Fiscal.Year>=Ten_percentile]
关于我的代码的简要说明:我在Ten_percentile
Fiscal.Year
上应用了Zone
截止值。
这是截止表:
Q
Zone Ten_percentile
1: East 2017
2: West 2018
这是预期的输出:
O
Zone Ten_percentile Fiscal.Year Transaction.ID L.Qty A.Qty I.Qty
1: East 2017 2017 171 1 0 3
2: West 2018 2018 177 2 0 1
3: West 2018 2018 178 1 0 1
以dput
格式
structure(list(Zone = structure(c(1L,2L,2L),
.Label = c("East","West"), class = "factor"),
Ten_percentile = c(2017,2018,2018),
Fiscal.Year = c(2017,2018,2018),
Transaction.ID = c(171,177,178), L.Qty = c(1,2,1),
A.Qty = c(0,0,0), I.Qty = c(3,1,1)),
.Names = c("Zone","Ten_percentile","Fiscal.Year","Transaction.ID",
"L.Qty","A.Qty","I.Qty"), class = "data.frame", row.names = c(NA,
-3L))
提前感谢您对我的任何帮助。我是data.table
的忠实粉丝。因此,我想学习解决同一问题的不同方法,并且精通data.table
和base-R
。
答案 0 :(得分:1)
我们可以进行非等连接
res <- as.data.table(Input_File)[Q, c(.SD, list(Ten_percentile = Ten_percentile)),
on = .(Zone, Fiscal.Year >= Ten_percentile)]