Question

我有一个（x，y）点的表，想要创建第二个表来总结这些点。

我希望摘要表中的每一行显示所有y的总和，其中x大于阈值序列。但我无法弄清楚如何将行的阈值加入内部总和。

我已经走到这一步了：

samples <- data.table(x=seq(1,100,1), y=seq(1,100,1))
thresholds = seq(10,100,10)
thresholdedSums <- data.table(xThreshold=thresholds, ySumWhereXGreaterThanThreshold=sum(samples[x > xThreshold, y]))

Error in eval(expr, envir, enclos) : object 'xThreshold' not found

我将如何实现这一目标，还是有不同的方式来做这类事情？

澄清所需的输出：

thresholdedSums = 
[
  (row 1) threshold = 10, ySumWhereXGreaterThanThreshold = sum of all y values in samples[] where x > 10,
  (row 2) threshold = 20, ySumWhereXGreaterThanThreshold = sum of all y values in samples[] where x > 20,
  ... etc ...
]

Answer 1

结果可以通过以下代码给出。该解决方案并非完全基于data.table，而是可靠地运行。

thresholdedSums <- data.table(
                     thres = thresholds,
                     Sum = sapply(thresholds, function(thres) samples[x > thres, sum(y)])
                   )

#    thres  Sum
# 1:    10 4995
# 2:    20 4840
# 3:    30 4585
# 4:    40 4230
# 5:    50 3775
# 6:    60 3220
# 7:    70 2565
# 8:    80 1810
# 9:    90  955
# 10:   100   0

其他说明：sapply(thresholds, function(thres) samples[x > thres, sum(y)])返回与thresholds长度相同的向量。您可以将其读作：thresholds中的每个元素都执行函数function(thres) samples[x > thres, sum(y)]并将结果作为vector返回。与for-loop相比，此过程通常性能更好，更易于阅读。

如何组合和汇总来自不同大小的不同表的R data.table行值？

1 个答案: