我有如下数据集:
a <- structure(list(ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9"),
E = c(1, 0, 0, 0,4, 4, 3, 0,1),
W = c(0, 3, 2, 0, 3, 3, 3, 4,1),
N = c(1, 0, 2, 0, 3, 4, 3, 7,1),
S=c(0, 0, 0, 2, 1, 1, 3, 0,1)),
.Names = c("ID", "E", "W", "N", "S"), row.names = c(NA, -9L), class = "data.frame")
a
ID E W N S
1 a1 1 0 1 0
2 a2 0 3 0 0
3 a3 0 2 2 0
4 a4 0 0 0 2
5 a5 4 3 3 1
6 a6 4 3 4 1
7 a7 3 3 3 3
8 a8 0 4 7 0
9 a9 1 1 1 1
我喜欢创建一个class
列,该列将基于每行的最大数目显示column name
。例如,ID为a2
的第二行的W
列具有最大值 3。该类将为W
,最大值为3。当最大值出现时,它将包括所有绑定的列名称。例如,对于第1行(ID=a1
),列E
和列N
都具有相同的最大值,因此该类将定义为{{ 1}},最大为1。
我已经开始使用E-N
函数:
melt
最终输出将是:
library(reshape2)
a1= melt(a, id=c("ID"))
head(a1)
ID variable value
1 a1 E 1
2 a2 E 0
3 a3 E 0
4 a4 E 0
5 a5 E 4
6 a6 E 4
答案 0 :(得分:4)
使用apply
的Base R方法将是在每一行中找到max
并找到相应的列名。
a[c("max", "class")] <- t(apply(a[-1], 1, function(x) {
val <- max(x)
c(val, paste(names(a)[-1][x == val], collapse = "-"))
}))
a[c(1, 6, 7)]
# ID max class
#1 a1 1 E-N
#2 a2 3 W
#3 a3 2 W-N
#4 a4 2 S
#5 a5 4 E
#6 a6 4 E-N
#7 a7 3 E-W-N-S
#8 a8 7 N
#9 a9 1 E-W-N-S
您还可以使用which
(如@tmfmnk所建议的那样)获取列名,将apply
调用中的最后一行替换为
c(val, paste(names(which(x == val)), collapse = "-"))
答案 1 :(得分:3)
1)一种选择是将gather
转换为'long'格式,按'ID'分组,filter
具有max
'val'的行,然后summarize
paste
,然后first
取{val}
library(tidyverse)
gather(a, key, val, -ID) %>%
group_by(ID) %>%
filter(val == max(val)) %>%
summarise(Class = str_c(key, collapse="-"), Max = first(val))
# A tibble: 9 x 3
# ID Class Max
# <chr> <chr> <dbl>
#1 a1 E-N 1
#2 a2 W 3
#3 a3 W-N 2
#4 a4 S 2
#5 a5 E 4
#6 a6 E-N 4
#7 a7 E-W-N-S 3
#8 a8 N 7
#9 a9 E-W-N-S 1
2)或tidyverse
中的其他选项而无需重塑
a %>%
mutate(out = select(., -1) %>%
pmap(., ~ c(...) %>%
keep(. == max(.)) %>%
{tibble(Class = str_c(names(.), collapse='-'),
Max = first(.))})) %>%
unnest
# ID E W N S Class Max
#1 a1 1 0 1 0 E-N 1
#2 a2 0 3 0 0 W 3
#3 a3 0 2 2 0 W-N 2
#4 a4 0 0 0 2 S 2
#5 a5 4 3 3 1 E 4
#6 a6 4 3 4 1 E-N 4
#7 a7 3 3 3 3 E-W-N-S 3
#8 a8 0 4 7 0 N 7
#9 a9 1 1 1 1 E-W-N-S 1
3)或使用data.table
library(data.table)
melt(setDT(a), id.var = 'ID')[, {
dat <- .SD[value == max(value)]
.(Class = paste(dat$variable, collapse='-'), Max = dat$value[1])}, by = ID]
注意:这给出了OP的预期输出
4)或使用base R
Max <- do.call(pmax, a[-1])
Class <- tapply(names(a)[-1][col(a[-1]) *NA^(a[-1] != Max)],
c(row(a[-1])), FUN = function(x) paste(na.omit(x), collapse='-'))