我有一个5列的数据框。我知道如何计算由另一列分组的一列的平均值。但是,我需要将其分为两列。例如,我要计算按列1和列2分组的第5列的平均值。
df <- structure(list(Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("AT", "CH", "DE"), class = "factor"),
Occupation = c(1L, 3L, 5L, 3L, 1L, 2L, 5L, 3L, 5L, 3L, 1L,
2L, 1L, 5L, 3L, 3L, 1L, 3L, 2L, 5L, 5L, 1L, 2L, 1L, 3L),
Age = c(20L, 46L, 30L, 12L, 73L, 53L, 19L, 43L, 65L, 53L,
19L, 34L, 76L, 25L, 45L, 39L, 18L, 59L, 37L, 24L, 19L, 60L,
51L, 32L, 29L), Gender = structure(c(1L, 1L, 2L, 2L, 2L,
1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L), .Label = c("female", "male"), class = "factor"),
Income = c(100L, 80L, 78L, 29L, 156L, 56L, 95L, 104L, 87L,
56L, 203L, 45L, 112L, 78L, 56L, 140L, 99L, 67L, 89L, 109L,
43L, 145L, 30L, 101L, 77L)), class = "data.frame", row.names = c(NA,
-25L))
head(df)
Country Occupation Age Gender Income
1 AT 1 20 female 100
2 AT 3 46 female 80
3 AT 5 30 male 78
4 AT 3 12 male 29
5 AT 1 73 male 156
6 AT 2 53 female 56
所以我要计算的是“收入”列的平均值,并按国家和职业分组。例如,我要计算居住在国家“ AT”,职业为“ 3”的所有人的“收入”平均值,居住在“ CH”国家,职业为“ 1”的所有人的“收入”平均值,等等。上。
答案 0 :(得分:1)
(1)基本方法(汇总)
mean.df <- aggregate(Income ~ Country + Occupation, df, mean)
names(mean.df)[3] <- "Income_Mean"
merge(df, mean.df)
(2)基本方法(套用)
mean.df1 <- tapply(df$Income, list(df$Country, df$Occupation), mean)
mean.df2 <- as.data.frame(as.table(mean.df1))
names(mean.df2) <- c("Country", "Occupation", "Income_Mean")
merge(df, mean.df2)
(3)统计方法(ave)
df2 <- df
df2$Income_Mean <- ave(df$Income, df$Country, df$Occupation)
(4)dplyr方法
df %>% group_by(Country, Occupation) %>%
mutate(Income_Mean = mean(Income))
输出:
Country Occupation Age Gender Income Income_Mean
<fct> <int> <int> <fct> <int> <dbl>
1 AT 1 20 female 100 128
2 AT 3 46 female 80 71
3 AT 5 30 male 78 86.5
4 AT 3 12 male 29 71
5 AT 1 73 male 156 128
6 AT 2 53 female 56 56
7 AT 5 19 male 95 86.5
8 AT 3 43 male 104 71
9 CH 5 65 male 87 82.5
10 CH 3 53 female 56 84
# ... with 15 more rows
答案 1 :(得分:0)
使用sqldf
:
sqldf("select Country,Occupation,Age,Gender,avg(Income) from df group by Country,Occupation")
OR
使用data.table
:
library(data.table)
df=data.table(df)
df[, mean(Income), by = list(Country,Occupation)]
输出:
Country Occupation Age Gender avg(Income)
1 AT 1 73 male 128.0
2 AT 2 53 female 56.0
3 AT 3 43 male 71.0
4 AT 5 19 male 86.5
5 CH 1 18 female 138.0
6 CH 2 34 male 45.0
7 CH 3 39 male 84.0
8 CH 5 25 female 82.5
9 DE 1 32 female 123.0
10 DE 2 51 female 59.5
11 DE 3 29 male 72.0
12 DE 5 19 male 76.0