我要从中导入数据的平台不支持指定数据类型,因此我的所有列均为character
。我有一个Excel文件,该文件指定哪些列为factor
,包括相关的labels
和levels
。现在,我正在尝试编写一个函数来动态更改我数据的各个列的数据类型。frame
由于对这个问题(dplyr - mutate: use dynamic variable names)的出色回答,我设法编写了以下函数,其中我将列名动态设置为mutate
函数。
readFactorData <- function(filepath) {
t <- read.xlsx(filepath)
sapply(nrow(t), function(i) {
colname <- as.character(t[i, "Item"])
factorLevels <- t[i, 3:ncol(t)][which(!is.na(t[i, 3:ncol(t)]))]
totalLevels <- length(factorLevels)
listOfLabels <- as.character(unlist(factorLevels))
mutate(d, !!colname := factor(d[[colname]], labels=(1:totalLevels), levels=listOfLabels))
# requires dplyr v.0.7+
# the syntax `!!variablename:=` forces evaluation of the variablename before evaluating the rest of the function
})
}
它起作用,并且每次迭代都返回整个数据帧,并且相关列(colname
)已更改为factor。但是,每次迭代都会覆盖前一个,因此此函数仅返回i
的最后结果。如何确保以1个单个数据帧结束,所有相关列都保存在其中?
示例数据(由于要在此处定义t
,因此请确保注释掉上面函数的第一行)
d <- data.frame("id" = sample(100:999, 10), "age" = sample(18:80, 10), "factor1" = c(rep("a", 3), rep("b", 3), rep("c", 4)), "factor2" = c("x","y","y","y","y","x","x","x","x","y"), stringsAsFactors = FALSE)
t <- data.frame("Item" = c("factor1","factor2"), "Label" = c("This is factor 1", "This is factor 2"), "level1" = c("a","x"), "level2" = c("b","y"), "level3" = c("c","NA"))
答案 0 :(得分:0)
如果我理解正确,那么您有一个数据框,而因子列值是另一数据框。您想从第一个df中提取这些并在第二个df中对这些列进行突变,然后将它们转换为因子。
如何保留列名的向量,然后将其全部突变?
colnames <- t %>%
pull(Item) %>%
as.character()
d_with_factors <- d %>%
mutate_at(colnames, as.factor)
然后
sapply(d_with_factors, class)
返回
id age factor1 factor2
"integer" "integer" "factor" "factor"
答案 1 :(得分:0)
如果要将factor
中的所有character
转换为data.frame
,则可以使用dplyr
的{{1}}。否则,如果您想使用列名的向量, @Eden Z的答案将为您完成。
mutate_if
当您可以在类中检查变量时:
library(tidyverse)
d_out <- d %>%
mutate_if(is.character, as.factor)
d_out
# id age factor1 factor2
#1 933 61 a x
#2 208 52 a y
#3 193 25 a y
#4 231 47 b y
#5 595 78 b y
#6 675 28 b x
#7 387 71 c x
#8 386 80 c x
#9 893 20 c x
#10 272 23 c y
答案 2 :(得分:0)
下面的函数映射为要更改的每个命名列指定的readr::parse_*
函数,然后允许您为每个命名列指定args(例如,levels
,如果使用parse_factor
)。
library(tidyverse)
parse_cols <- function(df, f, col_names, levels, ...){
# df: dataframe, f: char vec, col_names: char vec, levels: list of char vecs,
# ...: list of other potential args for parse_*
params_t <- tibble(x = map(col_names, ~df[[.x]]), levels = levels, ...) %>% transpose()
new_cols <- map2_df(.x = structure(f, names = col_names),
.y = params_t,
~R.utils::doCall(.x, args = .y, .ignoreUnusedArgs = TRUE))
df[names(new_cols)] <- new_cols
df
}
# function inputs -- perhaps just requiring a tibble input would be safer
parsings_vec <- c("parse_factor","parse_double", "parse_factor")
cols_vec <- c("manufacturer", "cty", "class")
factors_list <- list(unique(mpg[["manufacturer"]]), NULL, unique(mpg[["class"]]))
parse_cols(df = mpg, f = parsings_vec, col_names = cols_vec, levels = factors_list)
#> # A tibble: 234 x 11
#> manufacturer model displ year cyl trans drv cty hwy fl cla~
#> <fct> <chr> <dbl> <int> <int> <chr> <chr> <dbl> <int> <chr> <fc>
#> 1 audi a4 1.8 1999 4 auto~ f 18 29 p com~
#> 2 audi a4 1.8 1999 4 manu~ f 21 29 p com~
#> 3 audi a4 2 2008 4 manu~ f 20 31 p com~
#> 4 audi a4 2 2008 4 auto~ f 21 30 p com~
#> 5 audi a4 2.8 1999 6 auto~ f 16 26 p com~
#> 6 audi a4 2.8 1999 6 manu~ f 18 26 p com~
#> 7 audi a4 3.1 2008 6 auto~ f 18 27 p com~
#> 8 audi a4 q~ 1.8 1999 4 manu~ 4 18 26 p com~
#> 9 audi a4 q~ 1.8 1999 4 auto~ 4 16 25 p com~
#> 10 audi a4 q~ 2 2008 4 manu~ 4 20 28 p com~
#> # ... with 224 more rows