Question

我正在尝试规范化数据框架上的某些列，以便它们具有相同的平均值。我现在正在实施的解决方案，即使它有效，感觉就像有一种更简单的方法。

# we make a copy of women
w = women
# print out the col Means
colMeans(women)
height   weight 
65.0000 136.7333
# create a vector of factors to normalize with
factor = colMeans(women)/colMeans(women)[1]
# normalize the copy of women that we previously made
for(i in 1:length(factor)){w[,i] <- w[,i] / factor[i]}
#We achieved our goal to have same means in the columns
colMeans(w)
height weight 
65     65

我可以轻松地提出同样的事情apply ，但有更简单的事情，比如只做women/factor并得到正确答案吗？顺便问一下，women/factor实际上在做什么？正如：

colMeans(women/factor)
height   weight  
49.08646 98.40094

结果不一样。

Answer 1

这样做的一种方法是使用sweep。默认情况下，此函数从每行中减去摘要统计信息，但您也可以指定要执行的其他函数。在这种情况下，一个部门：

colMeans(sweep(women, 2, factor, '/'))

Answer 2

此外：

rowMeans(t(women)/factor)
#height weight 
#65     65

关于你的问题：

I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer? By the way, what does women/factor actually doing?

women/factor ## is similar to

unlist(women)/rep(factor,nrow(women))

您需要的是：

unlist(women)/rep(factor, each=nrow(women))

或

women/rep(factor, each=nrow(women))

在我的解决方案中，我没有使用rep，因为factor会根据需要进行回收。

t(women) ##matrix

as.vector(t(women))/factor #will give same result as above

或只是

t(women)/factor #preserve the dimensions for ?rowMeans

简而言之，列式操作正在这里发生。

Answer 3

也可以使用mapply

colMeans(mapply("/", w, factor))

重新提问women/factor做了什么，women是data.frame，有两列，而factor是长度为2的数字向量。因此，当您执行women/factor时，R会截取women的每个条目（即women[i,j]）并将其除以factor[1]，然后factor[2]。因为因子的长度比women短，所以R一遍又一遍地滚动factor。例如，您可以看到women[, 1]/factor的每个第二个条目等于women[, 1]的每个第二个条目（因为factor[1]等于1）

如何根据R中向量中的值变换数据帧的列？

3 个答案: