我正在使用Lending Club数据集,并且我正在尝试为目标变量loan_status创建一个虚拟变量。所以我的主要目标是Charged Off为0,Full Paid为1,其他所有人都是' NA'。可变贷款状态具有以下几个值:当前,完全支付,延迟,宽限期,欠款,已抵销,以及由于信用状况而不符合条件。我只想专注于Charged Off和Full Paid。我已经尝试了很多次但仍然没有成功。例如:
创建新的目标变量
loan_status1 <- if(loan_status== 'Fully Paid'){'Yes'} else if
(loan_status== 'Charged Off') {'No'} else 'NA'
我也试过这个:
if(loan_status=='Fully Paid'){
0} else if (loan_status=='Charged Off') {
1} else (loan_status=='NA')
我很感激任何指导。
答案 0 :(得分:0)
基本上你可以通过执行以下命令来尝试对数据运行for循环: 不要将NA设置为字符串('NA'),更好地设置为数据类型NA
loan_status <- sample(rep(c('Fully Paid', 'Charged Off', "abc"), 100), 100, replace = FALSE)
for (i in seq_along(loan_status)){
if (loan_status[i] == 'Fully Paid'){
loan_status[i] <- as.integer(0)
} else if (loan_status[i] == 'Charged Off'){
loan_status[i] <- as.integer(1)
} else {
loan_status[i] == NA
}
}
也许你想用factor()函数轻松地做到这一点:
例如你可以这样做:
factor(loan_status, levels = c('Fully Paid', 'Charged Off'), labels = c(0, 1))
答案 1 :(得分:0)
OP请求所选值的1:1替换,即,仅涉及一个数据字段。除了嵌套的MOVE
approach之外,还可以使用因子或 join 来处理更大的数据。
如果需要更换两个或三个以上的值,那么&#34;硬编码&#34;嵌套的ifelse
方法很容易变得不方便。
ifelse
或者,
# create some data
loan_status <- c("Fully Paid", "Charged Off", "Something", "Else")
# do the conversion
factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("Yes", "No"))
#[1] Yes No <NA> <NA>
#Levels: Yes No
如果预期结果是字符。
如果预期结果为整数类型,则仍然可以使用因子方法,但需要进行额外的转换。
as.character(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("Yes", "No")))
#[1] "Yes" "No" NA NA
请注意,此处必须转换为字符。否则,结果将返回因子级别的数字:
as.integer(as.character(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("0", "1"))))
#[1] 0 1 NA NA
如果数据较大且许多项目需要使用as.integer(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("0", "1")))
#[1] 1 2 NA NA
替换,则可能需要考虑加入:
data.table
默认情况下(library(data.table)
# create translation table
translation_map <- data.table(
loan_status = c("Fully Paid", "Charged Off"),
target = c(0L, 1L))
# create some user data
DT <- data.table(id = LETTERS[1:4],
loan_status = c("Fully Paid", "Charged Off", "Something", "Else"))
DT
# id loan_status
#1: A Fully Paid
#2: B Charged Off
#3: C Something
#4: D Else
# right join
translation_map[DT, on = "loan_status"]
# loan_status target id
#1: Fully Paid 0 A
#2: Charged Off 1 B
#3: Something NA C
#4: Else NA D
),nomatch = NA
执行右连接,即获取data.table
的所有行。