我有以下数据框df
ARTNR
=商品编号(有多余的商品编号)
ARTNR AMOUNT
20 10
12 10
12 10
20 10
12 100
20 200
... ...
我要创建数据框df_delta
sum_1
=每个AMOUNT
的{{1}}的总和(我想拥有一个商品编号,没有多余的内容)
ARTNR
= sum_minus_max
-sum_1
的{{1}}的最大值
AMOUNT
= ARTNR
/ n-1,其中n是average
的数量
sum_minus_max
= ARTNR
-delta
的{{1}}的最大值
average
有人可以帮助我吗?我将不胜感激!
非常感谢您!
答案 0 :(得分:1)
您可以使用dplyr这样操作数据:
library(dplyr)
df <- data.frame(ARTNR = c(20,12,12,20,12,20),
AMOUNT = c(10,10,10,10,100,200))
df %>% group_by(ARTNR) %>% summarize(sum_1 = sum(AMOUNT), sum_minus_max = sum(AMOUNT) - max(AMOUNT),
average = (sum(AMOUNT) - max(AMOUNT))/(n()-1),
delta = (sum(AMOUNT) - max(AMOUNT))/(n()-1) - max(AMOUNT))
这给出了:
# A tibble: 2 x 5
ARTNR sum_1 sum_minus_max average delta
<dbl> <dbl> <dbl> <dbl> <dbl>
1 12 120 20 10 -90
2 20 220 20 10 -190
答案 1 :(得分:1)
您可以像使用aggregate
:
newDataFrameName <- do.call(cbind, aggregate(AMOUNT ~ ARTNR, df, function(x) {
sumx <- sum(x)
maxx <- max(x)
meanx <- mean(x[x!=maxx])
c(sum_1=sumx, sum_minus_max=sum(x[x!=maxx]), average=meanx, delta=meanx-maxx)}))
newDataFrameName
# ARTNR sum_1 sum_minus_max average delta
#[1,] 12 120 20 10 -90
#[2,] 20 220 20 10 -190
答案 2 :(得分:0)
尝试以下脚本:
library(dplyr)
remove_max <- function(vector){
# Avoids remove vector with only 1 element
if(length(vector) == 1) return(vector)
indx <- which(vector == max(vector))
vector[-indx]
}
data %>%
group_by(ARTNR) %>%
summarize(
sum_1 = sum(AMOUNT),
sum_minus_max = sum_1 - max(AMOUNT),
average = mean(remove_max(AMOUNT)),
delta = average - max(AMOUNT)
)
希望这可以为您提供帮助。