我有一个data.frame,我需要计算每个“反组”的平均值(即下面的每个名称)。
Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32
我想要的输出如下所示,其中Rate1和Rate2的值是在每个组中找不到的列值的均值。请忽略该值,我已经在示例中进行了弥补。如果可能的话,我更愿意使用 dplyr 。
Name Rate1 Rate2
Aira 38 52.2
Ben 30.5 50.5
Cat 23.8 48.7
任何帮助,不胜感激!谢谢!
PS-感谢 Ianthe 复制了他们的问题及其问题的数据,但对问题进行了一些更改。 (Mean per group in a data.frame)
答案 0 :(得分:2)
这是基于R的另一个想法,
do.call(rbind, lapply(unique(df$Name), function(i)colMeans(df[!df$Name %in% i,-c(1:2)])))
# Rate1 Rate2
#[1,] 38.00000 52.16667
#[2,] 30.50000 50.50000
#[3,] 23.83333 48.66667
或以Name
结尾,
cbind.data.frame(Name = unique(df$Name), res1)
# Name Rate1 Rate2
#1 Aira 38.00000 52.16667
#2 Ben 30.50000 50.50000
#3 Cat 23.83333 48.66667
答案 1 :(得分:1)
library(tidyverse)
# exampel dataset
df = read.table(text = "
Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32
", header=T, stringsAsFactors=F)
# function that returns means of Rates after excluding a given name
AntiGroupMean = function(x) { df %>% filter(Name != x) %>% summarise_at(vars(matches("Rate")), mean) }
df %>%
distinct(Name) %>% # for each name
mutate(v = map(Name, AntiGroupMean)) %>% # apply the function
unnest(v) # unnest results
# # A tibble: 3 x 3
# Name Rate1 Rate2
# <chr> <dbl> <dbl>
# 1 Aira 38 52.2
# 2 Ben 30.5 50.5
# 3 Cat 23.8 48.7
答案 2 :(得分:1)
一个选项可能是:
df %>%
mutate_at(vars(Rate1, Rate2), list(sum = ~ sum(.))) %>%
mutate(rows = n()) %>%
group_by(Name) %>%
summarise(Rate1 = first((Rate1_sum - sum(Rate1))/(rows-n())),
Rate2 = first((Rate2_sum - sum(Rate2))/(rows-n())))
Name Rate1 Rate2
<chr> <dbl> <dbl>
1 Aira 38 52.2
2 Ben 30.5 50.5
3 Cat 23.8 48.7
或以不太简洁的形式:
df %>%
group_by(Name) %>%
summarise(Rate1 = first((sum(df$Rate1) - sum(Rate1))/(nrow(df)-n())),
Rate2 = first((sum(df$Rate2) - sum(Rate2))/(nrow(df)-n())))
答案 3 :(得分:1)
您可以将其计算为组均值的平均值,由每个组中观察值的数量加权,但给定行的权重等于0。
library(dplyr)
df %>%
group_by(Name) %>%
summarise(n = n(), Rate1 = mean(Rate1), Rate2 = mean(Rate2)) %>%
mutate_at(vars(starts_with('Rate')), ~
sapply(Name, function(x) weighted.mean(.x, n*(Name != x))))
# A tibble: 3 x 4
Name n Rate1 Rate2
<chr> <int> <dbl> <dbl>
1 Aira 3 38 52.2
2 Ben 3 30.5 50.5
3 Cat 3 23.8 48.7
答案 4 :(得分:0)
我们可以使用
library(dplyr)
library(purrr)
map_dfr(unique(df1$Name), ~
anti_join(df1, tibble(Name = .x)) %>%
summarise_at(vars(starts_with('Rate')), mean) %>%
mutate(Name = .x)) %>%
select(Name, everything())
# Name Rate1 Rate2
#1 Aira 38.00000 52.16667
#2 Ben 30.50000 50.50000
#3 Cat 23.83333 48.66667
df1 <- structure(list(Name = c("Aira", "Aira", "Aira", "Ben", "Ben",
"Ben", "Cat", "Cat", "Cat"), Month = c(1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), Rate1 = c(12L, 18L, 19L, 53L, 22L, 19L, 22L, 67L,
45L), Rate2 = c(23L, 73L, 45L, 19L, 87L, 45L, 87L, 43L, 32L)),
class = "data.frame", row.names = c(NA,
-9L))
答案 5 :(得分:0)
您可以尝试:
library(dplyr)
df %>%
mutate_at(
vars(contains('Rate')),
~ sapply(1:n(), function(x) mean(.[Name %in% setdiff(unique(df$Name), Name[x])], na.rm = TRUE)
)
) %>%
distinct_at(vars(-Month))
输出:
Name Rate1 Rate2
1 Aira 38.00000 52.16667
2 Ben 30.50000 50.50000
3 Cat 23.83333 48.66667
(尽管使用其他解决方案可能会更好,因为通过行sapply
在较大的数据集上会非常慢)