通过传递带有列名的有序向量来动态排序dplyr中的列以进行选择

时间:2015-12-03 13:19:47

标签: r sorting dataframe dplyr

我正在使用下面的代码生成一个简单的汇总表:

# Data
data("mtcars")
# Lib
require(dplyr)
# Summary
mt_sum <- mtcars %>%
  group_by(am) %>%
  summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
  mutate(am = as.character(am)) %>%
  left_join(y = as.data.frame(table(mtcars$am),
                              stringsAsFactors = FALSE),
            by = c("am" = "Var1")) 

代码产生了预期的结果:

> head(mt_sum)
Source: local data frame [2 x 10]

     am mpg_min cyl_min mpg_mean cyl_mean mpg_median cyl_median mpg_max cyl_max  Freq
  (chr)   (dbl)   (dbl)    (dbl)    (dbl)      (dbl)      (dbl)   (dbl)   (dbl) (int)
1     0    10.4       4 17.14737 6.947368       17.3          8    24.4       8    19
2     1    15.0       4 24.39231 5.076923       22.8          4    33.9       8    13

但是,我对列的排序方式不满意。特别是,我想:

  1. 按名称排序列

  2. 通过select()

  3. 中的dplyr实现这一目标

    所需订单

    所需的顺序如下:

    > names(mt_sum)[order(names(mt_sum))]
     [1] "am"         "cyl_max"    "cyl_mean"   "cyl_median" "cyl_min"    "Freq"       "mpg_max"   
     [8] "mpg_mean"   "mpg_median" "mpg_min" 
    

    尝试

    理想情况下,我希望通过names(mt_sum)[order(names(mt_sum))]方式对select()中的列进行排序。但代码:

    mt_sum <- mtcars %>%
      group_by(am) %>%
      summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
      mutate(am = as.character(am)) %>%
      left_join(y = as.data.frame(table(mtcars$am),
                                  stringsAsFactors = FALSE),
                by = c("am" = "Var1")) %>%
      select(names(.)[order(names(.))])
    

    将返回预期的错误:

    Error: All select() inputs must resolve to integer column positions.
    The following do not:
    *  names(.)[order(names(.))]
    

    在我的实际数据中,我正在生成大量的摘要列。因此,我的问题如何动态地将已排序的列名称传递给select()中的dplyr,以便它能理解并适用于data.frame手头?

    我的重点是找出将动态生成的列名传递给select()的方法。我知道我可以对base中的列进行排序,或者按照here的说明键入名称。

2 个答案:

答案 0 :(得分:8)

你肯定是在正确的道路上。

mt_sum <- mtcars %>%
  group_by(am) %>%
  summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
  mutate(am = as.character(am)) %>%
  left_join(y = as.data.frame(table(mtcars$am),
                              stringsAsFactors = FALSE),
            by = c("am" = "Var1")) %>%
  .[, names(.)[order(names(.))]]

答案 1 :(得分:6)

您只需要:

mt_sum %>% select(order(names(.)))
#Source: local data frame [2 x 10]
#
#     am cyl_max cyl_mean cyl_median cyl_min  Freq mpg_max mpg_mean mpg_median mpg_min
#  (chr)   (dbl)    (dbl)      (dbl)   (dbl) (int)   (dbl)    (dbl)      (dbl)   (dbl)
#1     0       8 6.947368          8       4    19    24.4 17.14737       17.3    10.4
#2     1       8 5.076923          4       4    13    33.9 24.39231       22.8    15.0

它有效,因为order会根据select的要求返回整数列位置。