有没有一种很好的整齐的方法可以将字符列转换为有序因子?

时间:2020-09-20 19:33:04

标签: r dplyr

我在数据中有一个列列表,我希望将其从字符转换为有序因子。我当前的解决方案是这种可怕的丑陋结构,该结构虽然有效,但确实使眼睛有点烫伤:

load_from_file <- function(filename) {
  d <- read.csv(filename)
  d <- d[,2:37]
  d %<>% na_if("")
  for(column in alwaystonever_questions) {
    eval(parse(text=paste('d$',column,' <- factor(d$',column,', ordered=TRUE,levels=c("Never","Rarely","Sometimes","Often","Always"))',sep="")))
  }
  d$HowAreYouFeeling <- factor(d$HowAreYouFeeling,ordered=TRUE,levels=c("Bad","NotSoGood","Ok","Good","Great"))
  d %<>% mutate_if(is.character,as.factor)
  return(d)
}

我希望改为使用一个简单的系列“%>%”来完成此操作,希望可以提高可读性。如何以更惯用的方式做到这一点?

1 个答案:

答案 0 :(得分:4)

这是一个tidyverse解决方案。假定alwaystonever_questions是一个字符向量。我省略了as.factor部分,因为factor应该足够了(但是如果它不起作用,请尝试再次添加它,我始终不确定因素):< / p>

library(dplyr)
load_from_file <- function(filename) {
  read.csv(filename) %>% 
    select(2:37) %>% 
    mutate(across(everything(), na_if, "")) %>% 
    mutate(across(contains(alwaystonever_questions),
           ~factor(.x, ordered = TRUE,
                   levels = c("Never","Rarely","Sometimes","Often","Always"))),
           HowAreYouFeeling = factor(HowAreYouFeeling,
                                     ordered = TRUE,
                                     levels=c("Bad","NotSoGood","Ok","Good","Great")))
}

如果您必须读很多文件,则可以执行以下操作:

library(purrr)
filenames <- list.files("path_to_directory")

list_dfs <- set_names(filenames) %>% 
  map(load_from_file)