我是R的新手,并且正在努力解决语法问题。假设我有以下数据帧数据:
value label second
1 a q
2 a q
3 a ASDF
4 b q
6 b QWERTY
6 b QWERTY
7 c q
8 c q
9 c q
10 d q
现在,我想获得df$second
的向量,该向量对应于给定值df$value
的{{1}}的最大值。例如,给定df$label
,我想返回df$label = 'a'
。对于'ASDF'
,我想返回df$label = 'b'
。
以下是我正在尝试的内容:
'QWERTY', 'QWERTY'
现在这不会生成正确的结果向量。我想通过sapply,tapply,mapply等找到一种方法。我只是无法理解这些功能。任何帮助将不胜感激。
答案 0 :(得分:3)
在data.table中直接前进:
library(data.table)
DT <- data.table(df, key="label")
DT[.(lab)][value==max(value), second]
# where `lab` is whatever label value you are trying to find
请注意,如果您要对label
的所有值执行此操作,请使用by
参数:
DT[, c(.SD, mx=max(value)), by=label][value==mx, second, by=label]
label second
1: a ASDF
2: b QWERTY
3: b QWERTY
4: c q
5: d q
答案 1 :(得分:2)
lapply( split(dat, dat$label),
function(df) df[df$value == max(df$value), "second"] )
$a
[1] ASDF
Levels: ASDF q QWERTY
$b
[1] QWERTY QWERTY
Levels: ASDF q QWERTY
$c
[1] q
Levels: ASDF q QWERTY
$d
[1] q
Levels: ASDF q QWERTY
如果你想摆脱因素包袱:
lapply( split(dat, dat$label),
function(df) as.character(df[df$value == max(df$value), "second"]) )
$a
[1] "ASDF"
$b
[1] "QWERTY" "QWERTY"
$c
[1] "q"
$d
[1] "q"
要提取特定叶子,请将该结果设置为值,并使用“[[”:
进行提取val <- lapply( split(dat, dat$label),
function(df) as.character(df[df$value == max(df$value), "second"]) )
val[["a"]]
#[1] "ASDF"
答案 2 :(得分:1)
替代base
R函数:
df2 <- by(data = df, df$label, function(x) x[x$value == max(x$value), ])
# result as a list
df2
# df$label: a
# value label second
# 3 3 a ASDF
# --------------------------------------------------------------------
# df$label: b
# value label second
# 5 6 b QWERTY
# 6 6 b QWERTY
# --------------------------------------------------------------------
# df$label: c
# value label second
# 9 9 c q
# --------------------------------------------------------------------
# df$label: d
# value label second
# 10 10 d q
# ...or as a data frame
do.call(rbind, df2)
# value label second
# a 3 a ASDF
# b.5 6 b QWERTY
# b.6 6 b QWERTY
# c 9 c q
# d 10 d q