我有一个包含5个不同列的数据框:
Test1 Test2 Test3 Test4 Test5
Sample1 PASS PASS FAIL WARN WARN
Sample2 PASS PASS FAIL PASS WARN
Sample3 PASS FAIL FAIL PASS WARN
Sample4 PASS FAIL FAIL PASS WARN
Sample5 PASS WARN FAIL WARN WARN
在每列中,为每个级别分配不同的因子。 在第1列中,“PASS”为1。 在第2栏中,“PASS”为2,“FAIL为1”。 在第3列中,“FAIL”为1。 在第4列中,“PASS”为1,“WARN”为2。 在第5栏中,“警告”是1。
按字母顺序进行 我需要“PASS”在所有列中为1,“WARN”在所有列中为2,并且在所有列中为“FAIL”3,以便我可以转换为矩阵并将其转换为热图。
目前,它正在根据特定列中显示的级别和字母顺序将因子分配给级别。
如何在整个数据框中保持不变?
答案 0 :(得分:9)
您可以更改数据集的级别" df"通过循环(lapply
)并使用指定的factor
再次转换为levels
,使其处于相同的顺序,并将其分配回相应的列。
lvls <- c('PASS', 'WARN', 'FAIL')
df[] <- lapply(df, factor, levels=lvls)
str(df)
# 'data.frame': 5 obs. of 5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2
如果您选择使用data.table
library(data.table)
setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]
setDT
转换为&#34; data.frame&#34;到&#34; data.table&#34;,将数据集的列名称(:=
)分配给重新转换的因子列(lapply(..)
)。 .SD
表示&#34;数据表的子集&#34;。
df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L),
.Label = "PASS", class = "factor"),
Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL",
"PASS", "WARN"), class = "factor"), Test3 = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 =
structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"),
class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label =
"WARN", class = "factor")), .Names = c("Test1",
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1",
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")
答案 1 :(得分:3)
使用dplyr
:
library(dplyr)
df <- df %>% mutate_each(funs(factor(., levels = c('PASS', 'WARN', 'FAIL'))))
你得到:
#> str(df)
#'data.frame': 5 obs. of 5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2
答案 2 :(得分:1)
更通用的方法,假设您的string
和data.frame
中可以包含其他NA
值:
library(magrittr)
fac = df %>% as.matrix %>% as.vector %>% unique
df1 = data.frame(lapply(df, factor, levels = fac[!is.na(fac)]))