如何使数据框中的因子级别在所有列中保持一致?

时间:2015-01-30 04:35:27

标签: r matrix dataframe r-factor

我有一个包含5个不同列的数据框:

         Test1   Test2   Test3  Test4  Test5 
Sample1  PASS    PASS    FAIL    WARN   WARN
Sample2  PASS    PASS    FAIL    PASS   WARN
Sample3  PASS    FAIL    FAIL    PASS   WARN
Sample4  PASS    FAIL    FAIL    PASS   WARN
Sample5  PASS    WARN    FAIL    WARN   WARN

在每列中,为每个级别分配不同的因子。 在第1列中,“PASS”为1。 在第2栏中,“PASS”为2,“FAIL为1”。 在第3列中,“FAIL”为1。 在第4列中,“PASS”为1,“WARN”为2。 在第5栏中,“警告”是1。

按字母顺序进行 我需要“PASS”在所有列中为1,“WARN”在所有列中为2,并且在所有列中为“FAIL”3,以便我可以转换为矩阵并将其转换为热图。

目前,它正在根据特定列中显示的级别和字母顺序将因子分配给级别。

如何在整个数据框中保持不变?

3 个答案:

答案 0 :(得分:9)

您可以更改数据集的级别" df"通过循环(lapply)并使用指定的factor再次转换为levels,使其处于相同的顺序,并将其分配回相应的列。

lvls <- c('PASS', 'WARN', 'FAIL')
df[] <-  lapply(df, factor, levels=lvls)
str(df)
# 'data.frame': 5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2

如果您选择使用data.table

library(data.table)
setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]

setDT转换为&#34; data.frame&#34;到&#34; data.table&#34;,将数据集的列名称(:=)分配给重新转换的因子列(lapply(..))。 .SD表示&#34;数据表的子集&#34;。

数据

df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L), 
.Label = "PASS", class = "factor"), 
  Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL", 
 "PASS", "WARN"), class = "factor"), Test3 = structure(c(1L, 
 1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 = 
 structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"), 
 class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = 
"WARN", class = "factor")), .Names = c("Test1", 
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1", 
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")

答案 1 :(得分:3)

使用dplyr

library(dplyr)
df <- df %>% mutate_each(funs(factor(., levels = c('PASS', 'WARN', 'FAIL'))))

你得到:

#> str(df)
#'data.frame':  5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2

答案 2 :(得分:1)

更通用的方法,假设您的stringdata.frame中可以包含其他NA值:

library(magrittr)

fac = df %>% as.matrix %>% as.vector %>% unique
df1 = data.frame(lapply(df, factor, levels = fac[!is.na(fac)]))