计算平均值而不考虑最大值

时间:2020-05-18 14:56:05

标签: r

我有以下数据框df

ARTNR =商品编号(有多余的商品编号)

ARTNR   AMOUNT
20      10
12      10
12      10
20      10
12      100
20      200
...     ...       

我要创建数据框df_delta

sum_1 =每个AMOUNT的{​​{1}}的总和(我想拥有一个商品编号,没有多余的内容)

ARTNR = sum_minus_max-sum_1的{​​{1}}的最大值

AMOUNT = ARTNR / n-1,其中n是average的数量

sum_minus_max = ARTNR-delta的{​​{1}}的最大值

average

有人可以帮助我吗?我将不胜感激!

非常感谢您!

3 个答案:

答案 0 :(得分:1)

您可以使用dplyr这样操作数据:

library(dplyr)
df <- data.frame(ARTNR = c(20,12,12,20,12,20), 
                 AMOUNT = c(10,10,10,10,100,200))

df %>% group_by(ARTNR) %>% summarize(sum_1 = sum(AMOUNT), sum_minus_max = sum(AMOUNT) - max(AMOUNT), 
                  average = (sum(AMOUNT) - max(AMOUNT))/(n()-1), 
                  delta =  (sum(AMOUNT) - max(AMOUNT))/(n()-1) - max(AMOUNT))

这给出了:

# A tibble: 2 x 5
  ARTNR sum_1 sum_minus_max average delta
  <dbl> <dbl>         <dbl>   <dbl> <dbl>
1    12   120            20      10   -90
2    20   220            20      10  -190

答案 1 :(得分:1)

您可以像使用aggregate

newDataFrameName <- do.call(cbind, aggregate(AMOUNT ~ ARTNR, df, function(x) {
  sumx <- sum(x)
  maxx <- max(x)
  meanx <- mean(x[x!=maxx])
  c(sum_1=sumx, sum_minus_max=sum(x[x!=maxx]), average=meanx, delta=meanx-maxx)}))
newDataFrameName
#    ARTNR sum_1 sum_minus_max average delta
#[1,]    12   120            20      10   -90
#[2,]    20   220            20      10  -190

答案 2 :(得分:0)

尝试以下脚本:

library(dplyr)

remove_max <- function(vector){ 
    # Avoids remove vector with only 1 element
    if(length(vector) == 1) return(vector)
    indx <- which(vector == max(vector))
    vector[-indx]
}

data %>%
    group_by(ARTNR) %>%
    summarize(
        sum_1 = sum(AMOUNT),
        sum_minus_max = sum_1 - max(AMOUNT),
        average = mean(remove_max(AMOUNT)),
        delta = average - max(AMOUNT)
    )

希望这可以为您提供帮助。