我正在尝试将this dataset中的四个字符变量转换为有序因子。我试图应用的逻辑显示在下面的变量“A”中:
df$A = factor(ifelse(df$A %in% c('NOT CONDUCTED AT ALL','RARELY'),'L1',
ifelse(df$A == 'OCCASIONALLY', 'L2',
ifelse(df$A == 'QUITE FREQUENTLY', 'L3', 'L4'))))
df$A = ordered(factor(df$A), levels=c('L1','L2','L3','L4'))
有没有办法使用相同的条件一次性转换所有变量?
答案 0 :(得分:3)
# first, make sure each column is an ordered factor with all of the levels
# if the columns of df are character, replace levels(unlist(df)) with unique(unlist(df))
df <- data.frame(lapply(df, factor, ordered=TRUE, levels=levels(unlist(df))))
# create mapping to the new levels following the order
# imposed in the previous step
new.lvl.mapping <- c('L4', 'L2', 'L3', 'L1', 'L1')
# make the replacement using the mapping
data.frame(lapply(df, function(col) new.lvl.mapping[col]))
答案 1 :(得分:1)
您可以定义一个创建所需因子的函数,并使用sapply()
将其同时应用于您想要的数据框的每一列。
# some fake data for example
dat <- c("NOT CONDUCTED AT ALL", "RARELY", "OCCASIONALLY", "QUITE FREQUENTLY", "ALWAYS")
df <- data.frame(A=sample(dat, 25, TRUE), B=sample(dat, 25, TRUE), D=rnorm(25))
head(df)
A B D
1 NOT CONDUCTED AT ALL QUITE FREQUENTLY -0.04049165
2 QUITE FREQUENTLY QUITE FREQUENTLY 0.74361906
3 ALWAYS OCCASIONALLY -0.93606555
4 ALWAYS ALWAYS 0.56659322
5 RARELY OCCASIONALLY 0.97216491
6 QUITE FREQUENTLY OCCASIONALLY 0.91125383
# define a function to create a new factor variable
newfac <- function(x, oldval, newval, ordered=TRUE) {
factor(newval[match(x, oldval)], ordered=TRUE)
}
# apply the function to each specified element of the data frame
df[, c("A", "B")] <- sapply(df[, c("A", "B")], newfac,
oldval=c("NOT CONDUCTED AT ALL", "RARELY", "OCCASIONALLY", "QUITE FREQUENTLY", "ALWAYS"),
newval=c("L1", "L1", "L2", "L3", "L4")
)
head(df)
A B D
1 L1 L3 -0.04049165
2 L3 L3 0.74361906
3 L4 L2 -0.93606555
4 L4 L4 0.56659322
5 L1 L2 0.97216491
6 L3 L2 0.91125383
答案 2 :(得分:0)
我会创建另一个data.frame
和merge
而不是嵌套ifelse
df2 <- data.frame(New_A = c("L1", "L1", "L2", "L3", "L4"),
Old_A = c("NOT CONDUCTED AT ALL", "RARELY",
"OCCASIONALLY", "QUITE FREQUENTLY", "ALWAYS"))
df3 <- merge(df, df2, by.x="A", by.y="Old_A")
df$A <- df3$New_A
就转换所有变量而言,您可以使用apply
执行此操作:
apply(df, 2, function(X) ...
每列将作为X