如何建立根据定义的列名对升序或降序排名的等级函数?

时间:2019-01-31 17:36:00

标签: r function dplyr

我正在寻找一个基于函数中定义的变量名称以升序或降序对变量进行排名的函数。

我可以手动进行排名,但是我希望能够调用该函数以简化df的代码。寻找某人向我展示如何在宽而长的df上执行该功能。我的示例代码如下。我希望收看电视节目,并且分钟数排名递增,其他各列均按降序排列。如果有人可以向我展示如何执行该函数,以便为升序和降序变量定义变量名,以及仅定义要按降序排列的变量,而所有其他列均默认为变量的另一种选择,那就太好了升序。

library(tidyverse)

df <- tibble::tribble(
                ~Name, ~Team, ~minutes, ~ftm, ~fta, ~oreb, ~dreb, ~treb, ~ast, ~stl, ~blk, ~tov, ~pts, ~eff,
  "Russell Westbrook", "OKC",     34.6,  8.8, 10.4,   1.7,     9,  10.7, 10.4,  1.6,  0.4,  5.4, 31.6, 33.8,
       "James Harden", "HOU",     36.4,  9.2, 10.9,   1.2,     7,   8.1, 11.2,  1.5,  0.5,  5.7, 29.1, 32.4,
      "Isaiah Thomas", "BOS",     33.8,  7.8,  8.5,   0.6,   2.1,   2.7,  5.9,  0.9,  0.2,  2.8, 28.9, 24.7,
      "Anthony Davis", "NOP",     36.1,  6.9,  8.6,   2.3,   9.5,  11.8,  2.1,  1.3,  2.2,  2.4,   28, 31.1,
      "DeMar DeRozan", "TOR",     35.4,  7.4,  8.7,   0.9,   4.3,   5.2,  3.9,  1.1,  0.2,  2.4, 27.3, 22.7,
     "Damian Lillard", "POR",     35.9,  6.5,  7.3,   0.6,   4.3,   4.9,  5.9,  0.9,  0.3,  2.6,   27, 24.5,
   "DeMarcus Cousins", "NOP",     34.2,  7.2,  9.3,   2.1,   8.9,    11,  4.6,  1.4,  1.3,  3.7,   27, 28.5,
       "LeBron James", "CLE",     37.8,  4.8,  7.2,   1.3,   7.3,   8.6,  8.7,  1.2,  0.6,  4.1, 26.4,   31,
      "Kawhi Leonard", "SAS",     33.4,  6.3,  7.2,   1.1,   4.7,   5.8,  3.5,  1.8,  0.7,  2.1, 25.5, 25.3,
      "Stephen Curry", "GSW",     33.4,  4.1,  4.6,   0.8,   3.7,   4.5,  6.6,  1.8,  0.2,    3, 25.3, 25.2
  )

df_wide <- df %>% 
  mutate_at(vars(ftm, ast), funs(rank = rank(desc(.)))) %>%
  mutate_at(vars(tov, minutes), funs(rank = rank((.))))

df_wide
#> # A tibble: 10 x 18
#>    Name  Team  minutes   ftm   fta  oreb  dreb  treb   ast   stl   blk
#>    <chr> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Russ~ OKC      34.6   8.8  10.4   1.7   9    10.7  10.4   1.6   0.4
#>  2 Jame~ HOU      36.4   9.2  10.9   1.2   7     8.1  11.2   1.5   0.5
#>  3 Isai~ BOS      33.8   7.8   8.5   0.6   2.1   2.7   5.9   0.9   0.2
#>  4 Anth~ NOP      36.1   6.9   8.6   2.3   9.5  11.8   2.1   1.3   2.2
#>  5 DeMa~ TOR      35.4   7.4   8.7   0.9   4.3   5.2   3.9   1.1   0.2
#>  6 Dami~ POR      35.9   6.5   7.3   0.6   4.3   4.9   5.9   0.9   0.3
#>  7 DeMa~ NOP      34.2   7.2   9.3   2.1   8.9  11     4.6   1.4   1.3
#>  8 LeBr~ CLE      37.8   4.8   7.2   1.3   7.3   8.6   8.7   1.2   0.6
#>  9 Kawh~ SAS      33.4   6.3   7.2   1.1   4.7   5.8   3.5   1.8   0.7
#> 10 Step~ GSW      33.4   4.1   4.6   0.8   3.7   4.5   6.6   1.8   0.2
#> # ... with 7 more variables: tov <dbl>, pts <dbl>, eff <dbl>,
#> #   ftm_rank <dbl>, ast_rank <dbl>, tov_rank <dbl>, minutes_rank <dbl>

df_long <- df %>%
  gather(key = data_col, value = "stat_value", 3:14) %>% 
  group_by(data_col) %>% 
  mutate(rank = if_else(data_col %in% c("tov", "minutes"), rank(stat_value, ties.method = "first"), rank(-stat_value, ties.method = "first")))

df_long
#> # A tibble: 120 x 5
#> # Groups:   data_col [12]
#>    Name              Team  data_col stat_value  rank
#>    <chr>             <chr> <chr>         <dbl> <int>
#>  1 Russell Westbrook OKC   minutes        34.6     5
#>  2 James Harden      HOU   minutes        36.4     9
#>  3 Isaiah Thomas     BOS   minutes        33.8     3
#>  4 Anthony Davis     NOP   minutes        36.1     8
#>  5 DeMar DeRozan     TOR   minutes        35.4     6
#>  6 Damian Lillard    POR   minutes        35.9     7
#>  7 DeMarcus Cousins  NOP   minutes        34.2     4
#>  8 LeBron James      CLE   minutes        37.8    10
#>  9 Kawhi Leonard     SAS   minutes        33.4     1
#> 10 Stephen Curry     GSW   minutes        33.4     2
#> # ... with 110 more rows

我想要的输出将与上面列出的df相同。我正在寻找创建一个清理手册if_else和上面两行代码的函数。假设该函数名为stat_rank。我希望代码可以执行以下操作:

df_wide <- df %>% 
  mutate_at(vars(ftm, ast, tov, minutes), funs(rank = stat_rank(.)))) 


df_long <- df %>%
  gather(key = data_col, value = "stat_value", 3:14) %>% 
  group_by(data_col) %>% 
  mutate(rank = stat_rank(stat_value))

1 个答案:

答案 0 :(得分:1)

如果我们需要一个功能,那么

#owl-featured > div > div {
    margin-bottom: -45px;
}

请注意,在上述实现中,列名是在函数内进行硬编码的。如果需要更灵活,那么可以通过列名作为另一参数

stat_rank <- function(x) {
     col1 <- deparse(substitute(x))
     if(col1 %in% c('ftm', 'ast')) {
     rank(desc(x)) 
     } else rank(x)

}

df %>% 
   mutate_at(vars(ftm, ast, tov, minutes), funs(rank = stat_rank))
# A tibble: 10 x 18
#   Name         Team  minutes   ftm   fta  oreb  dreb  treb   ast   stl   blk   tov   pts   eff ftm_rank ast_rank tov_rank minutes_rank
#   <chr>        <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl>        <dbl>
# 1 Russell Wes… OKC      34.6   8.8  10.4   1.7   9    10.7  10.4   1.6   0.4   5.4  31.6  33.8        2      2        9            5  
# 2 James Harden HOU      36.4   9.2  10.9   1.2   7     8.1  11.2   1.5   0.5   5.7  29.1  32.4        1      1       10            9  
# 3 Isaiah Thom… BOS      33.8   7.8   8.5   0.6   2.1   2.7   5.9   0.9   0.2   2.8  28.9  24.7        3      5.5      5            3  
# 4 Anthony Dav… NOP      36.1   6.9   8.6   2.3   9.5  11.8   2.1   1.3   2.2   2.4  28    31.1        6     10        2.5          8  
# 5 DeMar DeRoz… TOR      35.4   7.4   8.7   0.9   4.3   5.2   3.9   1.1   0.2   2.4  27.3  22.7        4      8        2.5          6  
# 6 Damian Lill… POR      35.9   6.5   7.3   0.6   4.3   4.9   5.9   0.9   0.3   2.6  27    24.5        7      5.5      4            7  
# 7 DeMarcus Co… NOP      34.2   7.2   9.3   2.1   8.9  11     4.6   1.4   1.3   3.7  27    28.5        5      7        7            4  
# 8 LeBron James CLE      37.8   4.8   7.2   1.3   7.3   8.6   8.7   1.2   0.6   4.1  26.4  31          9      3        8           10  
# 9 Kawhi Leona… SAS      33.4   6.3   7.2   1.1   4.7   5.8   3.5   1.8   0.7   2.1  25.5  25.3        8      9        1            1.5
#10 Stephen Cur… GSW      33.4   4.1   4.6   0.8   3.7   4.5   6.6   1.8   0.2   3    25.3  25.2       10      4        6            1.5

对于长格式数据,可以使用一个函数

stat_rank <- function(x, descCols) {
     col1 <- deparse(substitute(x))
     if(col1 %in% descCols) {
     rank(desc(x)) 
     } else rank(x)

}

df %>% 
   mutate_at(vars(ftm, ast, tov, minutes), 
           funs(rank = stat_rank(., descCols = c('ftm', 'ast'))))