我有一个包含2个id
组和5个val
列的数据框。我只想保留5列中具有最高val
值的2个median
列。例如,对于a
和x
,将保留val3
和val4
;对于a
和y
,将保留val2
和val3
,依此类推。然后,将为所有其他val
列提供NA
值。
id1 id2 val1 val2 val3 val4 val5 val1_median val2_median val3_median val4_median val5_median
1 a x NA NA 0.357 0.993 NA 1.28 1.41 1.74 1.56 1.45
2 a x NA NA 3.130 2.130 NA 1.28 1.41 1.74 1.56 1.45
3 a y NA 3.68 2.990 NA NA 3.88 4.83 4.59 3.07 2.98
4 a y NA 5.99 6.190 NA NA 3.88 4.83 4.59 3.07 2.98
5 b x NA NA 2.250 0.851 NA 2.41 2.66 2.91 2.85 2.76
6 b x NA NA 3.560 4.850 NA 2.41 2.66 2.91 2.85 2.76
7 b y NA NA NA 3.600 3.12 2.96 1.83 2.36 3.12 3.31
8 b y NA NA NA 2.640 3.50 2.96 1.83 2.36 3.12 3.31
我该怎么做?任何输入将不胜感激!
library(dplyr)
df <- data.frame(
val1 = c(0.373546189257668, 2.18364332422208, 2.16437138758995,
5.59528080213779, 1.32950777181536,
3.17953161588198, 3.48742905242849, 2.73832470512922),
val2 = c(0.94932754091116, 1.87825493706573, 3.6761525560408,
5.98512403854922, 0.708267191273557,
0.964831728704485, 4.61235997057159, 2.69339109611399),
val3 = c(0.357355926158721, 3.12747953490738, 2.98559258268804,
6.1891821233553, 2.24848514342358,
3.96166791661305, 3.56199403579368, 0.748973009265844),
val4 = c(0.993371937152378, 2.12751458469308, 2.00857588088462,
4.12452841823852, 0.85135771670674,
3.59747317608169, 4.84610860395753, 2.63553697778622),
val5 = c(0.761217800817037, 2.12983828363918, 0.787311830761346,
5.18028623883811, 0.935217818105011,
3.1202182191708, 4.58745442441237, 3.50150045358676),
id1 = as.factor(c("a", "a", "a", "a", "b", "b", "b", "b")),
id2 = as.factor(c("x", "x", "y", "y", "x", "y", "x", "y"))
)
df1 <- df %>%
group_by(id1, id2) %>%
mutate_at(vars(-group_cols()), funs(median = median(., na.rm = TRUE)))
df1
id1 id2 val1 val2 val3 val4 val5 val1_median val2_median val3_median val4_median val5_median
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a x 0.374 0.949 0.357 0.993 0.761 1.28 1.41 1.74 1.56 1.45
2 a x 2.18 1.88 3.13 2.13 2.13 1.28 1.41 1.74 1.56 1.45
3 a y 2.16 3.68 2.99 2.01 0.787 3.88 4.83 4.59 3.07 2.98
4 a y 5.60 5.99 6.19 4.12 5.18 3.88 4.83 4.59 3.07 2.98
5 b x 1.33 0.708 2.25 0.851 0.935 2.41 2.66 2.91 2.85 2.76
6 b x 3.49 4.61 3.56 4.85 4.59 2.41 2.66 2.91 2.85 2.76
7 b y 3.18 0.965 3.96 3.60 3.12 2.96 1.83 2.36 3.12 3.31
8 b y 2.74 2.69 0.749 2.64 3.50 2.96 1.83 2.36 3.12 3.31
答案 0 :(得分:1)
从您的df1
开始,我们可以将中值和值列分开。对于所有中位数列,我们找到了要转换为NA
的所有值。因此,我们按行order
df1
并获得除前2个以外的所有行索引。创建一个行列矩阵并将其值更改为NA
。
med_cols <- grep("median$", names(df1))
val_cols <- grep("^val", names(df1))
df1[val_cols][cbind(seq_len(nrow(df1)),
c(t(apply(df1[med_cols], 1, order))[, 1:(length(med_cols) - 2)]))] <- NA
# val1 val2 val3 val4 val5 id1 id2 val1_median val2_median val3_median val4_median val5_median
# <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 NA NA 0.357 0.993 NA a x 1.28 1.41 1.74 1.56 1.45
#2 NA NA 3.13 2.13 NA a x 1.28 1.41 1.74 1.56 1.45
#3 NA 3.68 2.99 NA NA a y 3.88 4.83 4.59 3.07 2.98
#4 NA 5.99 6.19 NA NA a y 3.88 4.83 4.59 3.07 2.98
#5 NA NA 2.25 0.851 NA b x 2.41 2.66 2.91 2.85 2.76
#6 NA NA NA 3.60 3.12 b y 2.96 1.83 2.36 3.12 3.31
#7 NA NA 3.56 4.85 NA b x 2.41 2.66 2.91 2.85 2.76
#8 NA NA NA 2.64 3.50 b y 2.96 1.83 2.36 3.12 3.31
答案 1 :(得分:0)
这将为您提供按组划分的中位数:
df %>% group_by(id1, id2) %>% summarise_at(vars(starts_with("val")), median)