计算R中数据帧中每个组的总和和最大最小值

时间:2018-08-15 19:35:01

标签: r dataframe sapply

我有一个下面的示例数据框

df <- data.frame("Group"= c(1,1,2,2,2),"H" = 
c("H1","H3","H3","H4","H2"), "W1" = c(95, 0, 0,0,50) , "W2" = c(0, 
95,95, 0,85),"W3" = c(85, 50,50 ,95,0))

需要计算另外两个指标: 1st Metric:基于每个组以及该组的w1,w2,w3中的行,如果w1,w2和w3的值等于或大于85,则输出为100%。 例如:对于组2,对于w2和w3,最大值等于85以上 对于w1,它小于85,因此结果为 66.7

2nd Metric:该组的w1,w2,w3列中最大行的最小值。例如:对于组2,min(max [0 0 50],max [95 0 85],max [50 95 0])= 50

为更清楚起见,以下是所需的输出数据帧:

DesiredDf <- data.frame("Group"= c(1,1,2,2,2),"H" =     
c("H1","H3","H3","H4","H2"), "W1" = c(95, 0, 0,0,50) , 
"W2" = c(0, 95,95, 0,85), "W3" = c(85, 50,50 ,95,0),
"W" = c(100,100,66.7 ,66.7,66.7),MINMAX = c(85,85,50,50,50))

已经尝试过使用循环和sapply方法,但是实际数据集太大且执行速度太慢。正在寻找在R中更无缝地计算这些指标的方法。

2 个答案:

答案 0 :(得分:2)

data.table方式:

# use data.table
library(data.table)
setDT(df)

# aggregate data by group in order to calculate the 2 desired metrics
df1 <- df[ , .(maxw1 = max(W1), maxw2 = max(W2), maxw3 = max(W3)), by=Group]

# calculate the metrics
df1[ , metric1 := rowMeans(cbind(maxw1>=85, maxw2>=85, maxw3>=85))]
df1[ , metric2 := do.call(pmin,.SD), .SDcols = c("maxw1", "maxw2", "maxw3")]

# merge metrics back on to original dataframe
df <- merge(df, df1[ , .(Group, metric1, metric2)], by="Group")

答案 1 :(得分:0)

通过使用dplyr

df %>% 
  group_by(Group) %>% 
  mutate(w = rowMeans(cbind(max(W1) >= 85, max(W2) >= 85, max(W3) >= 85)),
         minmax = min(max(W1), max(W2), max(W3)))

# A tibble: 5 x 7
# Groups:   Group [2]
  Group H        W1    W2    W3     w minmax
  <dbl> <fct> <dbl> <dbl> <dbl> <dbl>  <dbl>
1    1. H1      95.    0.   85. 1.00     85.
2    1. H3       0.   95.   50. 1.00     85.
3    2. H3       0.   95.   50. 0.667    50.
4    2. H4       0.    0.   95. 0.667    50.
5    2. H2      50.   85.    0. 0.667    50.