如何在R函数中使用top_n

时间:2016-11-14 12:08:13

标签: r dplyr

对于这样的data.frame我想创建一个函数,它将返回所选变量的5个最大观察值:

df1 <- structure(list(Yta = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), Rad = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Planta = c(1L, 
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L), Sortnr = c(8213L, 513L, 
8060L, 8093L, 2131L, 8200L, 2378L, 8135L, 8156L, 8256L), Dia12 = c(53L, 
29L, NA, NA, 53L, 6L, 20L, NA, 13L, 20L), Dia34 = c(177L, 39L, 
NA, NA, 0L, 77L, 101L, NA, 77L, 95L), Vit34 = c(2L, 1L, NA, NA, 
2L, 1L, 2L, NA, 1L, 1L), Ska1 = c(NA, 542L, NA, NA, 634L, NA, 
NA, NA, NA, NA), Ska2 = c(NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_), Dia34_2 = c(NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), block1 = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), block = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), x = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2), y = c(1, 
2, 3, 4, 5, 6, 1, 2, 3, 4), id = c("1:1:1", "1:1:2", "1:1:3", 
"1:1:4", "1:1:5", "1:1:6", "1:2:1", "1:2:2", "1:2:3", "1:2:4"
)), .Names = c("Yta", "Rad", "Planta", "Sortnr", "Dia12", "Dia34", 
"Vit34", "Ska1", "Ska2", "Dia34_2", "block1", "block", "x", "y", 
"id"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

我想使用dplyr喜欢它!我试过这样的功能,但错误!我猜到的是,函数的第三个参数是无法识别的。将不胜感激提示如何克服这个问题

prMval <- function(df, sort, varia){
  df %>% #filter(Sortnr == sort) %>%
    #filter(!(Rad %in% c(min(Rad), max(Rad))) & !(Planta %in% c(min(Planta), max(Planta)))) %>%
    top_n(5, varia)
}

prMval(df1, 2, Dia34)


Error: object 'varia' not found 

1 个答案:

答案 0 :(得分:1)

使用top_n时,它会将变量“varia”作为输入中的列进行查找,但不会解释varia。通过使用lazyeval包,我们可以确保在top_n之前解释varia

library(lazyeval)
prMval <- function(df, sort, varia){
  tmp <- df #%>% filter(Sortnr == sort) %>%
    #filter(!(Rad %in% c(min(Rad), max(Rad))) & !(Planta %in% c(min(Planta), max(Planta))))

    lazy_eval(interp(~top_n(tmp, 5, varia), varia = as.name(varia)))
    # Replace varia with the input and then interpret the resulting call
}

prMval(df1, 2, "Dia34") # Make sure to pass a character string as varia

返回:

# A tibble: 5 x 15
    Yta   Rad Planta Sortnr Dia12 Dia34 Vit34  Ska1  Ska2 Dia34_2 block1 block     x     y    id
  <int> <int>  <int>  <int> <int> <int> <int> <int> <int>   <int>  <int> <int> <dbl> <dbl> <chr>
1     1     1      1   8213    53   177     2    NA    NA      NA      1     1     1     1 1:1:1
2     1     1      6   8200     6    77     1    NA    NA      NA      1     1     1     6 1:1:6
3     1     2      1   2378    20   101     2    NA    NA      NA      1     1     2     1 1:2:1
4     1     2      3   8156    13    77     1    NA    NA      NA      1     1     2     3 1:2:3
5     1     2      4   8256    20    95     1    NA    NA      NA      1     1     2     4 1:2:4

我还没想到如何在管道内执行此操作,所以我已将步骤分开。