Question

我在R中遇到一个小问题：

假设我有一个包含两列的数据框，一列包含频率，另一列包含分数。我怀疑分数的方差取决于频率。所以我想通过分档频率将我的分数标准化为均值= 0和var = 1.

例如，假设我想要10个箱子。首先，每个分数都会被分配一个bin，然后在该bin中，每个分数将通过该bin中所有分数的均值和方差进行归一化。

结果应该是具有标准化值的第三列

使用bins = cut(frequencies, b=bins, 1:bins)轻松获取数据分区，但是我还没有找到办法从那里开始。

提前致谢！

Answer 1

scale是你的朋友，在归一化为均值= 0，sd = 1，如果sd = 1，var = 1。

> mean(scale(1:10))
[1] 0
> sd(scale(1:10))
[1] 1
> var(scale(1:10))
     [,1]
[1,]    1

尝试一些示例数据：

set.seed(42)
dat <- data.frame(freq=sample(1:100), scores=rnorm(100, mean=4, sd=2))
dat$bins <- cut(dat$freq, breaks=c(0, 1:10*10), include.lowest=TRUE)

现在使用ave scale scores bins中的每个dat$scaled <- with(dat,ave(scores,bins,FUN=scale))：

aggregate

您可以使用mean或类似内容查看结果：

每个bin中> aggregate(scaled ~ bins, data=dat, FUN=function(x) round(mean(x), 2) ) bins scaled 1 [0,10] 0 2 (10,20] 0 3 (20,30] 0 4 (30,40] 0 5 (40,50] 0 6 (50,60] 0 7 (60,70] 0 8 (70,80] 0 9 (80,90] 0 10 (90,100] 0为0（或非常接近舍入误差）。

sd

每个箱子中> aggregate(scaled ~ bins, data=dat, FUN=sd) bins scaled 1 [0,10] 1 2 (10,20] 1 3 (20,30] 1 4 (30,40] 1 5 (40,50] 1 6 (50,60] 1 7 (60,70] 1 8 (70,80] 1 9 (80,90] 1 10 (90,100] 1为1：

{{1}}

R quirk：通过另一个向量的分箱值标准化向量的内容

1 个答案: