Question

我正在尝试编写一些代码，用于标识每行的最大两个值，并提供其列号和值。

df = data.frame( car = c (2,1,1,1,0), bus = c (0,2,0,1,0),
                 walk = c (0,3,2,0,0), bike = c(0,4,0,0,1))

我已设法使用max和max.col函数以最大值执行此操作。

df$max = max.col(df,ties.method="first")
df$val = apply(df[ ,1:4], 1, max)

据我所知，第二个值没有等效函数，所以这样做会让事情变得有点棘手。使用此代码提供第二高的值，但（重要的是）不提供有关系的情况。它看起来也很危险。

sec.fun <- function (x) {
  max( x[x!=max(x)] )
}

df$val2 <- apply(df[ ,1:4], 1, sec.fun)

理想情况下，该解决方案不会涉及删除任何原始数据，并且可用于查找第三，第四......最高值，但这些都不是基本要求。

Answer 1

试试这个：

# a function that returns the position of n-th largest
maxn <- function(n) function(x) order(x, decreasing = TRUE)[n]

这是一个闭包，所以你可以像这样使用：

> # position of the largest
> apply(df, 1, maxn(1))
[1] 1 4 3 1 4
> # position of the 2nd largest
> apply(df, 1, maxn(2))
[1] 2 3 1 2 1
> 
> # value of the largest
> apply(df, 1, function(x)x[maxn(1)(x)])
[1] 2 4 2 1 1
> # value of the 2nd largest
> apply(df, 1, function(x)x[maxn(2)(x)])
[1] 0 3 1 1 0

更新

为什么在这里使用闭包？

一个原因是您可以定义一个函数，例如：

max2 <- maxn(2)
max3 <- maxn(3)

然后，使用它

> apply(df, 1, max2)
[1] 2 3 1 2 1
> apply(df, 1, max3)
[1] 3 2 2 3 2

我不确定优势是否显而易见，但我喜欢这种方式，因为这更具功能性。

查找列号并连续计算第二个最高值

1 个答案: