Question

我们有这个df

# We create the df
x <- c(1,33,5,2,56,1)
y <- c(4,358,57,3,32,2)

df <- as.data.frame(cbind(x,y))

rownames(df) <- c("a", "b", "c", "d", "e", "f")

df是：

我想从列x中检索最大值及其第二高值的行名，以及y列中的相同行名。

结果将是来自e列的b和x以及来自b列的c和y。

我尝试了这些代码，但没有成功。

rownames(df)[max(df$x)] # for the maximum value
nx <- length(df$x) # length of the x column
rownames(df)[sort(df$x, partial=nx-1)[nx-1]] # for the second max value

但是，前三个代码行的结果是：

NA  # what's wrong?
6   # yeah, it is 6
"e" # nope, the second max is "b"

问题出在哪里？如何解决这些问题？

Answer 1

你很接近：只需找到值

rownames(df[df$x == max(df$x),]) # for the maximum value
nx <- length(df$x) # length of the x column
rownames(df[df$x == sort(df$x, partial=nx-1)[nx-1],]) # for the second max value

Answer 2

我们可以遍历列，order逐渐减少，使用该索引获取rownames，前两个子集

sapply(df, function(x) head(row.names(df)[order(x, decreasing = TRUE)], 2))
#      x   y  
#[1,] "e" "b"
#[2,] "b" "c"

Answer 3

使用 dplyr::filter 和基本 R quantile 函数的替代方法。

df %>%  
  filter(variable > quantile(.$variable, 0.975, na.rm = T))

检索R中列的最大值和第二个最大值的行名

3 个答案: