如何获得R中每行中第二高的值(和最小的值)

时间:2018-10-19 12:58:09

标签: r

对于数据框中的每一行,我想查找第二高的发生值以及最小的发生值。我该怎么办?

Df:

label v1 v2 v3 v4 v5 v6
5     3  3  3  6  6  8
5     7  1  1  1  7  0
5     3  5  6  6  6  5

我想考虑“标签”之外的所有列

预期输出:

second largest occuring     least occuring
6                           8
7                           0
5                           3

编辑:接受答案后,我已经更新了示例,以减少混乱

2 个答案:

答案 0 :(得分:4)

一种dplyr解决方案:

df %>%
  rowid_to_column() %>%
  gather(var, val, -label, -rowid) %>%
  group_by(rowid, val) %>%
  tally() %>%
  summarise(second_largest_occuring = val[dense_rank(n) == 2],
            least_occuring = val[n == min(n)]) %>%
  ungroup() %>%
  select(-rowid)

# A tibble: 3 x 2
  second_largest_occuring least_occuring
                    <int>          <int>
1                       2              1
2                       2              0
3                       5              3

数据:

df <- read.table(text = "label v1 v2 v3 v4 v5 v6
5     3  3  3  2  2  1
                 5     2  1  1  1  2  0
                 5     3  5  6  6  6  5", header= TRUE)

答案 1 :(得分:1)

另一种dplyr解决方案,它更具可读性,可以处理NA和多次出现第二大错误的实例的错误。该解决方案还允许您使用dplyr语言选择多个列。

library(dplyr)

dat = read.table(text = 'label v1 v2 v3 v4 v5 v6
5     3  3  3  2  2  1
5     2  1  1  1  2  0
5     3  5  6  6  6  5', header = T)

second_largest <- function(x,na.rm = TRUE) {
  if(na.rm) { x <- na.omit(x) } # omit NA values
  second_largest <- x[dense_rank(x) == 2] # return all values where the rank is equal to 2nd largest
  second_largest <- max(second_largest) # keep one value out of all the second largest, or NA
  return(second_largest)
}

df <- dat %>%
  mutate(
    second_largest = select(., v1:v6) %>% apply(1, second_largest,na.rm = TRUE), # apply second_largest func to every row
    min = select(., v1:v6) %>% apply(1,min,na.rm = TRUE) # apply min to every row
  ) 

#   label v1 v2 v3 v4 v5 v6 second_largest min
# 1     5  3  3  3  2  2  1              2   1
# 2     5  2  1  1  1  2  0              1   0
# 3     5  3  5  6  6  6  5              5   3

一些注意事项。

apply语句中的1表示应将函数应用于行。

更新

如果您想要第二个最常见的数字,只需插入一个新函数即可。

second_most_frequent <- function(x, is_numeric = TRUE) {
  out <- x %>%
    table() %>% # Create a table of frequencies as characters
    as.data.frame(stringsAsFactors = FALSE) %>%
    arrange(desc(Freq)) %>% # Arrange with frequency descending
    .[,1] %>% # Select the first column
    .[2] # select the second most frequent (WARNING: Doesn't check for ties)
  if(is_numeric){ out <- as.numeric(out) }
  return(out)
}

df <- df %>%
  mutate(
    second_most_freq = select(., v1:v6) %>% apply(1,second_most_frequent,is_numeric = TRUE)
  )

#   label v1 v2 v3 v4 v5 v6 second_largest min second_most_freq
# 1     5  3  3  3  2  2  1              2   1                2
# 2     5  2  1  1  1  2  0              1   0                2
# 3     5  3  5  6  6  6  5              5   3                5