如何根据R中向量中的值变换数据帧的列?

时间:2014-06-25 13:22:51

标签: r

我正在尝试规范化数据框架上的某些列,以便它们具有相同的平均值。我现在正在实施的解决方案,即使它有效,感觉就像有一种更简单的方法。

# we make a copy of women
w = women
# print out the col Means
colMeans(women)
height   weight 
65.0000 136.7333
# create a vector of factors to normalize with
factor = colMeans(women)/colMeans(women)[1]
# normalize the copy of women that we previously made
for(i in 1:length(factor)){w[,i] <- w[,i] / factor[i]}
#We achieved our goal to have same means in the columns
colMeans(w)
height weight 
65     65

我可以轻松地提出同样的事情apply ,但有更简单的事情,比如只做women/factor并得到正确答案吗? 顺便问一下,women/factor实际上在做什么?正如:

colMeans(women/factor)
height   weight  
49.08646 98.40094

结果不一样。

3 个答案:

答案 0 :(得分:1)

这样做的一种方法是使用sweep。默认情况下,此函数从每行中减去摘要统计信息,但您也可以指定要执行的其他函数。在这种情况下,一个部门:

colMeans(sweep(women, 2, factor, '/'))

答案 1 :(得分:1)

此外:

rowMeans(t(women)/factor)
#height weight 
#65     65 

关于你的问题:

I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer? By the way, what does women/factor actually doing?

women/factor ## is similar to

unlist(women)/rep(factor,nrow(women))

您需要的是:

unlist(women)/rep(factor, each=nrow(women))

women/rep(factor, each=nrow(women))

在我的解决方案中,我没有使用rep,因为factor会根据需要进行回收。

t(women) ##matrix

as.vector(t(women))/factor #will give same result as above

或只是

t(women)/factor #preserve the dimensions for ?rowMeans

简而言之,列式操作正在这里发生。

答案 2 :(得分:1)

也可以使用mapply

colMeans(mapply("/", w, factor))

重新提问women/factor做了什么,womendata.frame,有两列,而factor是长度为2的数字向量。因此,当您执行women/factor时,R会截取women的每个条目(即women[i,j])并将其除以factor[1],然后factor[2]。因为因子的长度比women短,所以R一遍又一遍地滚动factor。 例如,您可以看到women[, 1]/factor的每个第二个条目等于women[, 1]的每个第二个条目(因为factor[1]等于1)