数据框中的复杂/混合排序列

时间:2017-07-25 12:16:16

标签: r dataframe

我在数据框中有一个这样的列......

retention_completion_variable_name <- data.frame(
  retention_completion_variable_name = c(
    "Completed Degree in 1 Year",
    "Retained to Midyear Year 1",
    "Completed Degree in 2 Years",
    "Retained to Midyear Year 2",
    "Retained to Start of Year 2"
  ),
  retention_completion_value = c(0, 0, 0, 1, 1),
  stringsAsFactors = FALSE
)   

我想将此列排序为

       Retained to Midyear Year 1                     0              
       Retained to Start of Year 2                    1        
       Retained to Midyear Year 2                     1             
       Completed Degree in 1 Year                     0            
       Completed Degree in 2 Years                    0              

1 个答案:

答案 0 :(得分:3)

这是我认为factor()非常有用的少数情况之一:

lvls <- c("Retained to Midyear Year 1", "Retained to Start of Year 2", 
          "Retained to Midyear Year 2", "Completed Degree in 1 Year", 
          "Completed Degree in 2 Years")
DT$retention_completion_variable_name <- 
  factor(DT$retention_completion_variable_name, levels = lvls)
DT <- DT[order(DT$retention_completion_variable_name), ]
DT
  retention_completion_variable_name retention_completion_value
2         Retained to Midyear Year 1                          0
5        Retained to Start of Year 2                          1
4         Retained to Midyear Year 2                          1
1         Completed Degree in 1 Year                          0
3        Completed Degree in 2 Years                          0

数据

DT <- as.data.frame(readr::read_table(
  "retention_completion_variable_name      retention_completion_value     
   Completed Degree in 1 Year                         0            
   Retained to Midyear Year 1                         0              
   Completed Degree in 2 Years                        0              
   Retained to Midyear Year 2                         1             
   Retained to Start of Year 2                        1    "
))

增强

如果要覆盖很多年,手动创建因子水平将非常麻烦并且容易出错。但是,这也可以通过遵守三条规则来实现自动化

  1. 所有“保留”在所有“已完成”之前到来。
  2. 在保留期内,按年份和年份排序“开始”和“年中”。
  3. 在“已完成”中,按年份排序。
  4. 这些规则可用于以编程方式创建因子级别:

    n_years <- 5L
    lvls <- c(paste(c("Retained to Start of Year", "Retained to Midyear Year"), 
                    rep(seq_len(n_years), each = 2L)),
              sprintf("Completed Degree in %i Years", seq_len(n_years)))
    lvls
    
     [1] "Retained to Start of Year 1" "Retained to Midyear Year 1"  "Retained to Start of Year 2"
     [4] "Retained to Midyear Year 2"  "Retained to Start of Year 3" "Retained to Midyear Year 3" 
     [7] "Retained to Start of Year 4" "Retained to Midyear Year 4"  "Retained to Start of Year 5"
    [10] "Retained to Midyear Year 5"  "Completed Degree in 1 Years" "Completed Degree in 2 Years"
    [13] "Completed Degree in 3 Years" "Completed Degree in 4 Years" "Completed Degree in 5 Years"