我正在尝试规范化数据框架上的某些列,以便它们具有相同的平均值。我现在正在实施的解决方案,即使它有效,感觉就像有一种更简单的方法。
# we make a copy of women
w = women
# print out the col Means
colMeans(women)
height weight
65.0000 136.7333
# create a vector of factors to normalize with
factor = colMeans(women)/colMeans(women)[1]
# normalize the copy of women that we previously made
for(i in 1:length(factor)){w[,i] <- w[,i] / factor[i]}
#We achieved our goal to have same means in the columns
colMeans(w)
height weight
65 65
我可以轻松地提出同样的事情apply
,但有更简单的事情,比如只做women/factor
并得到正确答案吗?
顺便问一下,women/factor
实际上在做什么?正如:
colMeans(women/factor)
height weight
49.08646 98.40094
结果不一样。
答案 0 :(得分:1)
这样做的一种方法是使用sweep
。默认情况下,此函数从每行中减去摘要统计信息,但您也可以指定要执行的其他函数。在这种情况下,一个部门:
colMeans(sweep(women, 2, factor, '/'))
答案 1 :(得分:1)
此外:
rowMeans(t(women)/factor)
#height weight
#65 65
关于你的问题:
I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer? By the way, what does women/factor actually doing?
women/factor ## is similar to
unlist(women)/rep(factor,nrow(women))
您需要的是:
unlist(women)/rep(factor, each=nrow(women))
或
women/rep(factor, each=nrow(women))
在我的解决方案中,我没有使用rep
,因为factor
会根据需要进行回收。
t(women) ##matrix
as.vector(t(women))/factor #will give same result as above
或只是
t(women)/factor #preserve the dimensions for ?rowMeans
简而言之,列式操作正在这里发生。
答案 2 :(得分:1)
也可以使用mapply
colMeans(mapply("/", w, factor))
重新提问women/factor
做了什么,women
是data.frame
,有两列,而factor
是长度为2的数字向量。因此,当您执行women/factor
时,R会截取women
的每个条目(即women[i,j]
)并将其除以factor[1]
,然后factor[2]
。因为因子的长度比women
短,所以R一遍又一遍地滚动factor
。
例如,您可以看到women[, 1]/factor
的每个第二个条目等于women[, 1]
的每个第二个条目(因为factor[1]
等于1)