我的输入数据是 DF
anger sad joy happy trust disgust
1 1 0 1 2 3 0
2 2 0 0 2 0 3
3 2 2 1 1 1 1
4 0 1 1 1 0 1
我想要这样的输出
MYDATA
anger sad joy happy trust disgust col
1 1 0 1 2 3 0 trust
2 2 0 0 2 0 3 disgust
我想从每一行中提取最大值colname,但只输出那些只有一个最大值colname的行,并丢弃具有多个colname的所有其他行。
我试过这个
d1 <- df[!apply(df[-1], 1, function(x) anyDuplicated(x[x == max(x)])),]
但我得到了这个
anger sad joy happy trust disgust
1 1 0 1 2 3 0
2 2 0 0 2 0 3
3 2 2 1 1 1 1
我不希望输出中有第三行。
提前感谢您的帮助。
答案 0 :(得分:1)
我们可以使用max.col
来获取行的子集
d1 <- mydata[!apply(mydata[-1], 1, anyDuplicated),]
d1$out <- names(d1)[-1][max.col(d1[-1], 'first')]
d1
# zone_id v1 v2 v3 v4 out
#1 1 12 15 18 20 v4
#3 3 31 28 14 2 v1
#4 4 12 16 9 5 v2
#5 5 5 18 10 12 v2
如果OP只想删除最大值的重复值,则用
替换第一行d1 <- mydata[!apply(mydata[-1], 1, function(x) anyDuplicated(x[x == max(x)])),]
根据OP的newdataset,我们不需要删除第一列,因为它不是id列
d2 <- mydata1[!apply(mydata1, 1, function(x) anyDuplicated(x[x == max(x)])),]
d2$out <- names(d2)[max.col(d2, 'first')]
d2
# anger sad joy happy trust disgust out
#1 1 0 1 2 3 0 trust
#2 2 0 0 2 0 3 disgust
mydata1 <- structure(list(anger = c(1L, 2L, 2L, 0L), sad = c(0L, 0L, 2L,
1L), joy = c(1L, 0L, 1L, 1L), happy = c(2L, 2L, 1L, 1L), trust = c(3L,
0L, 1L, 0L), disgust = c(0L, 3L, 1L, 1L)), .Names = c("anger", "sad",
"joy", "happy", "trust", "disgust"), row.names = c(NA, 4L),
class = "data.frame")
答案 1 :(得分:0)
你可以尝试:
mydata %>%
select(-zone_id) %>%
mutate(mx = do.call(pmax, (.))) %>%
select(mx) %>%
cbind(mydata) %>%
mutate( flg = rowSums(. == mx)) %>%
filter(flg ==2) %>%
select(-flg) %>%
gather(key = out, value= v, -mx, -zone_id) %>%
filter(mx == v) %>%
select(zone_id, mx, out) %>%
left_join(mydata)
给出:
zone_id mx out v1 v2 v3 v4
1 3 31 v1 31 28 2 2
2 4 16 v2 1 16 9 1
3 5 18 v2 5 18 10 12
4 1 20 v4 12 15 18 20