所以,我想计算每列的平均值,并将结果放在列下面的行中。让我们从数据开始:
> head(tbl_mut)
timetE4_1 timetE1_2 timetE2_2 timetE3_2 timetE4_2 eve_mean mor_mean tot_mean
1 4048.605 59094.48 27675.59 26374.06 43310.01 7774.442 39113.53 23443.99
2 45729.986 139889.21 111309.64 129781.17 96924.62 43374.117 119476.16 81425.14
3 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4 4466.153 26250.32 20320.08 18413.54 29061.25 3866.547 23511.30 13688.92
这就是我想要实现的目标:
timetE4_1 timetE1_2 timetE2_2 timetE3_2 timetE4_2 eve_mean mor_mean tot_mean
1 4048.605 59094.48 27675.59 26374.06 43310.01 7774.442 39113.53 23443.99
2 45729.986 139889.21 111309.64 129781.17 96924.62 43374.117 119476.16 81425.14
3 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4 4466.153 26250.32 20320.08 18413.54 29061.25 3866.547 23511.30 13688.92
.....
445 X X X X X X X X
X - 列中值的平均值。
怎么做?
答案 0 :(得分:5)
使用rbind
和colMeans
,如下所示:
> rbind(tbl_mut, colMeans=colMeans(tbl_mut))
timetE4_1 timetE1_2 timetE2_2 timetE3_2 timetE4_2 eve_mean mor_mean tot_mean
1 4048.605 59094.48 27675.59 26374.06 43310.01 7774.442 39113.53 23443.99
2 45729.986 139889.21 111309.64 129781.17 96924.62 43374.117 119476.16 81425.14
3 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4 4466.153 26250.32 20320.08 18413.54 29061.25 3866.547 23511.30 13688.92
colMeans 173482.724 497479.54 319083.15 330634.05 331434.59 160144.458 369657.83 264901.15
修改强>
假设您的数据名为df
,如下所示:
> df
Description timetE4_1 timetE1_2 timetE2_2 timetE3_2 timetE4_2 eve_mean mor_mean tot_mean
1 A 4048.605 59094.48 27675.59 26374.06 43310.01 7774.442 39113.53 23443.99
2 B 45729.986 139889.21 111309.64 129781.17 96924.62 43374.117 119476.16 81425.14
3 C 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4 D 4466.153 26250.32 20320.08 18413.54 29061.25 3866.547 23511.30 13688.92
其中Description
是因子变量,那么您可以执行以下操作来获取colmeans。
> suppressWarnings(rbind(df, colMeans=colMeans(df[, sapply(df, is.numeric)])))
Description timetE4_1 timetE1_2 timetE2_2 timetE3_2 timetE4_2 eve_mean mor_mean tot_mean
1 A 4048.605 59094.48 27675.59 26374.06 43310.01 7774.442 39113.53 23443.99
2 B 45729.986 139889.21 111309.64 129781.17 96924.62 43374.117 119476.16 81425.14
3 C 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4 D 4466.153 26250.32 20320.08 18413.54 29061.25 3866.547 23511.30 13688.92
colMeans <NA> 497479.542 319083.15 330634.05 331434.59 160144.46 369657.833 264901.15 173482.72
如果您知道非数字变量在哪里,则只需suppressWarnings(rbind(df, colMeans=colMeans(df[, -1])))
。
答案 1 :(得分:4)
R确实有一个函数addmargins
可以让你做这样的事情,但是它需要table
或matrix
作为输入。
addmargins(as.matrix(mydf), 1, FUN = mean)
# timetE4_1 timetE1_2 timetE2_2 timetE3_2 timetE4_2 eve_mean mor_mean tot_mean
# 1 4048.605 59094.48 27675.59 26374.06 43310.01 7774.442 39113.53 23443.99
# 2 45729.986 139889.21 111309.64 129781.17 96924.62 43374.117 119476.16 81425.14
# 3 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
# 4 4466.153 26250.32 20320.08 18413.54 29061.25 3866.547 23511.30 13688.92
# mean 173482.724 497479.54 319083.15 330634.05 331434.59 160144.458 369657.83 264901.15
有一个almost identical (conceptually) question here,我想我也会在这里分享我的答案。
假设我们从:
开始mydf <- structure(list(Description = c("A", "B", "C", "D"),
timetE4_1 = c(4048.605, 45729.986, 639686.154, 4466.153),
Boo = structure(1:4, .Label = c("a", "b", "c", "d"),
class = "factor"), timetE1_2 = c(59094.48, 139889.21,
1764684.16, 26250.32), timetE2_2 = c(27675.59, 111309.64,
1117027.29, 20320.08), Baa = c(FALSE, FALSE, TRUE, NA)),
.Names = c("Description", "timetE4_1", "Boo", "timetE1_2",
"timetE2_2", "Baa"), row.names = c("1", "2", "3", "4"),
class = "data.frame")
mydf
# Description timetE4_1 Boo timetE1_2 timetE2_2 Baa
# 1 A 4048.605 a 59094.48 27675.59 FALSE
# 2 B 45729.986 b 139889.21 111309.64 FALSE
# 3 C 639686.154 c 1764684.16 1117027.29 TRUE
# 4 D 4466.153 d 26250.32 20320.08 NA
@Jilber的解决方案在这种情况下不起作用,并且会导致许多错位的列。相反,请使用“plyr”中的rbind.fill
。我在本例中使用sapply
来指定我的函数,以表明您可以轻松使用所需的任何函数,而不仅仅是col*
函数。
library(plyr)
useme <- sapply(mydf, is.numeric)
rbind.fill(mydf, data.frame(t(sapply(mydf[useme], sum))))
# Description timetE4_1 Boo timetE1_2 timetE2_2 Baa
# 1 A 4048.605 a 59094.48 27675.59 FALSE
# 2 B 45729.986 b 139889.21 111309.64 FALSE
# 3 C 639686.154 c 1764684.16 1117027.29 TRUE
# 4 D 4466.153 d 26250.32 20320.08 NA
# 5 <NA> 693930.898 <NA> 1989918.17 1276332.60 NA