创建满足数据框中两个因子条件的向量

时间:2013-10-08 21:33:05

标签: r

我是R的新手,并且正在努力解决语法问题。假设我有以下数据帧数据:

value   label    second
1       a        q
2       a        q
3       a        ASDF
4       b        q
6       b        QWERTY
6       b        QWERTY
7       c        q
8       c        q
9       c        q
10      d        q

现在,我想获得df$second的向量,该向量对应于给定值df$value的{​​{1}}的最大值。例如,给定df$label,我想返回df$label = 'a'。对于'ASDF',我想返回df$label = 'b'

以下是我正在尝试的内容:

'QWERTY', 'QWERTY'

现在这不会生成正确的结果向量。我想通过sapply,tapply,mapply等找到一种方法。我只是无法理解这些功能。任何帮助将不胜感激。

3 个答案:

答案 0 :(得分:3)

在data.table中直接前进:

library(data.table)
DT <- data.table(df, key="label")
DT[.(lab)][value==max(value), second]

# where `lab` is whatever label value you are trying to find

请注意,如果您要对label所有值执行此操作,请使用by参数:

DT[, c(.SD, mx=max(value)), by=label][value==mx, second, by=label]

   label second
1:     a   ASDF
2:     b QWERTY
3:     b QWERTY
4:     c      q
5:     d      q

答案 1 :(得分:2)

lapply( split(dat, dat$label),
       function(df) df[df$value == max(df$value), "second"] )
$a
[1] ASDF
Levels: ASDF q QWERTY

$b
[1] QWERTY QWERTY
Levels: ASDF q QWERTY

$c
[1] q
Levels: ASDF q QWERTY

$d
[1] q
Levels: ASDF q QWERTY

如果你想摆脱因素包袱:

 lapply( split(dat, dat$label), 
    function(df) as.character(df[df$value == max(df$value), "second"]) )
$a
[1] "ASDF"

$b
[1] "QWERTY" "QWERTY"

$c
[1] "q"

$d
[1] "q"

要提取特定叶子,请将该结果设置为值,并使用“[[”:

进行提取
val <- lapply( split(dat, dat$label), 
    function(df) as.character(df[df$value == max(df$value), "second"]) )
val[["a"]]
#[1] "ASDF"

答案 2 :(得分:1)

替代base R函数:

df2 <- by(data = df, df$label, function(x) x[x$value == max(x$value), ])

# result as a list
df2
# df$label: a
# value label second
# 3     3     a   ASDF
# -------------------------------------------------------------------- 
#   df$label: b
# value label second
# 5     6     b QWERTY
# 6     6     b QWERTY
# -------------------------------------------------------------------- 
#   df$label: c
# value label second
# 9     9     c      q
# -------------------------------------------------------------------- 
#   df$label: d
# value label second
# 10    10     d      q

# ...or as a data frame
do.call(rbind, df2)
#     value label second
# a       3     a   ASDF
# b.5     6     b QWERTY
# b.6     6     b QWERTY
# c       9     c      q
# d      10     d      q