如何使用dplyr
标准化数据框的每一行,使行元素总和为一行?
我更喜欢使用mutate_each
,但结果不正确,因为我在下面指定了它们。有什么问题?
library(dplyr)
iris %>%select(-Species)%>%mutate_each(funs(./sum(.)))%>%head()
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 0.005818597 0.007631923 0.002483591 0.001111729
2 0.005590416 0.006541648 0.002483591 0.001111729
3 0.005362236 0.006977758 0.002306191 0.001111729
4 0.005248146 0.006759703 0.002660990 0.001111729
5 0.005704507 0.007849978 0.002483591 0.001111729
6 0.006160867 0.008504143 0.003015789 0.002223457
答案 0 :(得分:3)
mutate_each
不按行计算sum
列,除了已弃用之外,您应该使用mutate_all
代替;要标准化每一行,您可以先计算行总和,然后按行划分所有列:
row_sum = rowSums(select(iris, -Species))
iris %>% select(-Species) %>% mutate_all(~ ./row_sum) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width
#1 0.5000000 0.3431373 0.1372549 0.01960784
#2 0.5157895 0.3157895 0.1473684 0.02105263
#3 0.5000000 0.3404255 0.1382979 0.02127660
#4 0.4893617 0.3297872 0.1595745 0.02127660
#5 0.4901961 0.3529412 0.1372549 0.01960784
#6 0.4736842 0.3421053 0.1491228 0.03508772
如果您更喜欢单个烟斗:
iris %>%
mutate(row_sum = rowSums(select(., 1:4))) %>%
mutate_at(1:4, ~ ./row_sum) %>%
select(-row_sum) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 0.5000000 0.3431373 0.1372549 0.01960784 setosa
#2 0.5157895 0.3157895 0.1473684 0.02105263 setosa
#3 0.5000000 0.3404255 0.1382979 0.02127660 setosa
#4 0.4893617 0.3297872 0.1595745 0.02127660 setosa
#5 0.4901961 0.3529412 0.1372549 0.01960784 setosa
#6 0.4736842 0.3421053 0.1491228 0.03508772 setosa
答案 1 :(得分:2)
RS <- rowSums(iris[,1:4])
iris %>%
mutate_if(is.numeric, funs(. / RS))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 0.5000000 0.3431373 0.1372549 0.01960784 setosa
2 0.5157895 0.3157895 0.1473684 0.02105263 setosa
3 0.5000000 0.3404255 0.1382979 0.02127660 setosa
4 0.4893617 0.3297872 0.1595745 0.02127660 setosa
5 0.4901961 0.3529412 0.1372549 0.01960784 setosa
# etc