Question

我目前能够使用以下代码快速计算我拥有的数百万个数据集的平均值：

PosAvg = mean( curTweets$posScore[curTweets$posScore > 1])
uniqPosTweets = curTweets[ curTweets$posScore > abs(curTweets$negScore) ,]
UniqPosAvg = mean( uniqPosTweets$posScore )

但是，我想对这些进行加权，并且仍然以与上述相同的方式保持我的效率。

curTweets $ posScore / curTweets $ negScore的值可以是1,2,3,4,5。

让我们说我想给出以下权重：分别为6,7,8,9,10。我使用这些数字来区分posScore的潜在价值。实际权重在我的算法中计算。

有办法做到这一点吗？在保持这种效率的同时，我无法弄清楚如何减肥。我不得不循环遍历每个条目并单独计算贡献吗？

谢谢！

Answer 1

foo <- seq(5)
weights <- c(1, 1, 1, 1, 100)
vectorized_weighted_mean <- sum(foo * weights) / sum(weights)

R - 带加权的矢量化均值

1 个答案: