Question

我有一个5列的数据框。我知道如何计算由另一列分组的一列的平均值。但是，我需要将其分为两列。例如，我要计算按列1和列2分组的第5列的平均值。

df <- structure(list(Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("AT", "CH", "DE"), class = "factor"), 
    Occupation = c(1L, 3L, 5L, 3L, 1L, 2L, 5L, 3L, 5L, 3L, 1L, 
    2L, 1L, 5L, 3L, 3L, 1L, 3L, 2L, 5L, 5L, 1L, 2L, 1L, 3L), 
    Age = c(20L, 46L, 30L, 12L, 73L, 53L, 19L, 43L, 65L, 53L, 
    19L, 34L, 76L, 25L, 45L, 39L, 18L, 59L, 37L, 24L, 19L, 60L, 
    51L, 32L, 29L), Gender = structure(c(1L, 1L, 2L, 2L, 2L, 
    1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 
    2L, 2L, 1L, 1L, 2L), .Label = c("female", "male"), class = "factor"), 
    Income = c(100L, 80L, 78L, 29L, 156L, 56L, 95L, 104L, 87L, 
    56L, 203L, 45L, 112L, 78L, 56L, 140L, 99L, 67L, 89L, 109L, 
    43L, 145L, 30L, 101L, 77L)), class = "data.frame", row.names = c(NA, 
-25L))

head(df)

  Country Occupation Age Gender Income
1      AT          1  20 female    100
2      AT          3  46 female     80
3      AT          5  30   male     78
4      AT          3  12   male     29
5      AT          1  73   male    156
6      AT          2  53 female     56

所以我要计算的是“收入”列的平均值，并按国家和职业分组。例如，我要计算居住在国家“ AT”，职业为“ 3”的所有人的“收入”平均值，居住在“ CH”国家，职业为“ 1”的所有人的“收入”平均值，等等。上。

Answer 1

（1）基本方法（汇总）

mean.df <- aggregate(Income ~ Country + Occupation, df, mean)
names(mean.df)[3] <- "Income_Mean"
merge(df, mean.df)

（2）基本方法（套用）

mean.df1 <- tapply(df$Income, list(df$Country, df$Occupation), mean)
mean.df2 <- as.data.frame(as.table(mean.df1))
names(mean.df2) <- c("Country", "Occupation", "Income_Mean")
merge(df, mean.df2)

（3）统计方法（ave）

df2 <- df
df2$Income_Mean <- ave(df$Income, df$Country, df$Occupation)

（4）dplyr方法

df %>% group_by(Country, Occupation) %>%
       mutate(Income_Mean = mean(Income))

输出：

   Country Occupation   Age Gender Income Income_Mean
   <fct>        <int> <int> <fct>   <int>       <dbl>
 1 AT               1    20 female    100       128  
 2 AT               3    46 female     80        71  
 3 AT               5    30 male       78        86.5
 4 AT               3    12 male       29        71  
 5 AT               1    73 male      156       128  
 6 AT               2    53 female     56        56  
 7 AT               5    19 male       95        86.5
 8 AT               3    43 male      104        71  
 9 CH               5    65 male       87        82.5
10 CH               3    53 female     56        84
# ... with 15 more rows

Answer 2

使用sqldf：

sqldf("select Country,Occupation,Age,Gender,avg(Income) from df group by Country,Occupation")

OR

使用data.table：

library(data.table)
df=data.table(df)
df[, mean(Income), by = list(Country,Occupation)]

输出：

    Country Occupation Age Gender avg(Income)
1       AT          1  73   male       128.0
2       AT          2  53 female        56.0
3       AT          3  43   male        71.0
4       AT          5  19   male        86.5
5       CH          1  18 female       138.0
6       CH          2  34   male        45.0
7       CH          3  39   male        84.0
8       CH          5  25 female        82.5
9       DE          1  32 female       123.0
10      DE          2  51 female        59.5
11      DE          3  29   male        72.0
12      DE          5  19   male        76.0

计算按其他两列的值分组的列的平均值

2 个答案: