我正在尝试将因子变量转换为数字。我尝试了这两种解决方案 -
as.numeric(levels(f))[f]
as.numeric(as.character(f))
但问题仍然存在。警告消息 - 强制引入的NA
可重复的例子 -
df = data.frame(x = c("10: Already Delinquent 90+",
"11: Credit History <6 Months",
"12: Current Balance = 0",
"13: Balance (2-6)=0",
"20: 1+ x 90+",
"30: 3+ x 60-89",
"31: 2 x 60-89",
"32: 1 x 60-89",
"40: 3+ x 30-59",
"41: 2 x 30-59",
"42: 1 x 30-59",
"50: Insufficient Performance",
"60: 3+ x 1-29",
"61: 2 x 1-29",
"62: 1 x 1-29",
"70: Never delinquent"),
y = c("00:Bad",
"01:Ind",
"02:Good",
"NA",
"00:Bad",
"01:Ind",
"02:Good",
"NA",
"00:Bad",
"01:Ind",
"02:Good",
"NA",
"00:Bad",
"01:Ind",
"02:Good",
"NA"),
z = ceiling(rnorm(16)))
#Select all the factor variables
factorvars = colnames(df)[which(sapply(df,is.factor))]
#Concatenate with "_Num"
xxx <- paste(factorvars, "_Num", sep="")
#Converting Factor to Numeric
for (i in 1:length(factorvars))
df[,xxx[i]] = NA
df[,xxx[i]] = as.numeric(levels(df[,factorvars[i]]) [df[,factorvars[i]]])
我希望保留因子变量并创建新的变量,并将级别转换为数字。所需的输出如下所示 -
x y x_num y_num
10: Already Delinquent 90+ 00:Bad 1 1
11: Credit History <6 Months 01:Ind 2 2
12: Current Balance = 0 02:Good 3 3
13: Balance (2-6)=0 NA 4 NA
20: 1+ x 90+ 00:Bad 5 1
30: 3+ x 60-89 01:Ind 6 2
31: 2 x 60-89 02:Good 7 3
32: 1 x 60-89 NA 8 NA
40: 3+ x 30-59 00:Bad 9 1
41: 2 x 30-59 01:Ind 10 2
42: 1 x 30-59 02:Good 11 3
50: Insufficient Performance NA 12 NA
60: 3+ x 1-29 00:Bad 13 1
61: 2 x 1-29 01:Ind 14 2
62: 1 x 1-29 02:Good 15 3
70: Never delinquent NA 16 NA
答案 0 :(得分:2)
根据您所需的输出判断,您看起来并不想将因子转换为字符串中包含的数字。相反,您需要内部表示因子。
试试这个:
df[,xxx] <- lapply(df[,factorvars], as.numeric)
# x y z x_Num y_Num
# 1 10: Already Delinquent 90+ 00:Bad 2 1 1
# 2 11: Credit History <6 Months 01:Ind 2 2 2
# 3 12: Current Balance = 0 02:Good 1 3 3
# 4 13: Balance (2-6)=0 <NA> 1 4 NA
# 5 20: 1+ x 90+ 00:Bad 0 5 1
# 6 30: 3+ x 60-89 01:Ind 0 6 2
# 7 31: 2 x 60-89 02:Good 0 7 3
# 8 32: 1 x 60-89 <NA> 0 8 NA
# 9 40: 3+ x 30-59 00:Bad 2 9 1
# 10 41: 2 x 30-59 01:Ind 0 10 2
# 11 42: 1 x 30-59 02:Good 0 11 3
# 12 50: Insufficient Performance <NA> 1 12 NA
# 13 60: 3+ x 1-29 00:Bad 1 13 1
# 14 61: 2 x 1-29 01:Ind -1 14 2
# 15 62: 1 x 1-29 02:Good -1 15 3
# 16 70: Never delinquent <NA> -1 16 NA
数据强>
我通过更改字符串&#34; NA&#34;来清理您的示例数据。到实际的NA值:
is.na(df$y) <- df$y == "NA"
df$y <- droplevels(df$y)