所以我有这个例子df
:
df <- dput(structure(list(arts = structure(c(1L, 1L, 3L, 4L), .Label = c("art1", "art2"), class = "character"), scr1 = c(52L, 58L, 40L, 62L), scr2 = c(25L, 23L, 55L, 26L), scr3 = c(36L, 60L, 19L, 22L)), .Names = c("art_id", "scr1", "scr2", "scr3"), row.names = c(NA, -4L), class = "data.frame"))
> df
art_id scr1 scr2 scr3
1 1 52 25 36
2 1 58 23 60
3 3 40 55 19
4 4 62 26 22
我正在使用dplyr
按art_id
df %>%
group_by(art_id) %>%
summarise_each(funs(sum))
art_id scr1 scr2 scr3
<int> <int> <int> <int>
1 1 110 48 96
2 3 40 55 19
3 4 62 26 22
我的问题:如何添加另一个名为top_r
的列,其中包含src1:src3
中最大的列名。结果df看起来像:
art_id scr1 scr2 scr3 top_r
<int> <int> <int> <int> <char>
1 1 110 48 96 scr1
2 3 40 55 19 scr2
3 4 62 26 22 scr1
我很自在地使用dplyr
所以如果有一个答案使用那个很棒的库!
答案 0 :(得分:3)
在基础R中使用max.col
:
df$top_r <- names(df)[-1][max.col(df[-1])]
答案 1 :(得分:2)
这样可行:
df %>%
group_by(art_id) %>%
summarise_each(funs(sum)) %>%
mutate(top_r=apply(.[,2:4], 1, function(x) names(x)[which.max(x)]))
# A tibble: 3 × 5
art_id scr1 scr2 scr3 top_r
<int> <int> <int> <int> <chr>
1 1 110 48 96 scr1
2 3 40 55 19 scr2
3 4 62 26 22 scr1
答案 2 :(得分:0)
library(dplyr)
library(tidyr)
df2 <- df %>%
group_by(art_id) %>%
summarise_each(funs(sum))
df3 <- df2 %>%
gather(top_r, Value, -art_id) %>%
arrange(art_id, desc(Value)) %>%
group_by(art_id) %>%
slice(1) %>%
select(-Value)
df_final <- df2 %>%
left_join(df3, by = "art_id")
df_final
# A tibble: 3 × 5
art_id scr1 scr2 scr3 top_r
<int> <int> <int> <int> <chr>
1 1 110 48 96 scr1
2 3 40 55 19 scr2
3 4 62 26 22 scr1