使用贷款默认数据在{R}中创建虚拟变量

时间:2017-03-23 04:15:28

标签: r

我正在使用Lending Club数据集,并且我正在尝试为目标变量loan_status创建一个虚拟变量。所以我的主要目标是Charged Off为0,Full Paid为1,其他所有人都是' NA'。可变贷款状态具有以下几个值:当前,完全支付,延迟,宽限期,欠款,已抵销,以及由于信用状况而不符合条件。我只想专注于Charged Off和Full Paid。我已经尝试了很多次但仍然没有成功。例如:

创建新的目标变量

loan_status1 <- if(loan_status== 'Fully Paid'){'Yes'} else if
 (loan_status== 'Charged Off') {'No'} else 'NA'

我也试过这个:

if(loan_status=='Fully Paid'){
   0} else if (loan_status=='Charged Off') {
   1} else (loan_status=='NA')

我很感激任何指导。

2 个答案:

答案 0 :(得分:0)

基本上你可以通过执行以下命令来尝试对数据运行for循环: 不要将NA设置为字符串('NA'),更好地设置为数据类型NA

loan_status <- sample(rep(c('Fully Paid', 'Charged Off', "abc"), 100), 100, replace = FALSE)

for (i in seq_along(loan_status)){
  if (loan_status[i] == 'Fully Paid'){
    loan_status[i] <- as.integer(0)
  } else if (loan_status[i] == 'Charged Off'){
    loan_status[i] <- as.integer(1)
  } else {
    loan_status[i] == NA
  }
}

也许你想用factor()函数轻松地做到这一点:

例如你可以这样做:

factor(loan_status, levels = c('Fully Paid', 'Charged Off'), labels = c(0, 1))

答案 1 :(得分:0)

OP请求所选值的1:1替换,即,仅涉及一个数据字段。除了嵌套的MOVE approach之外,还可以使用因子或 join 来处理更大的数据。

如果需要更换两个或三个以上的值,那么&#34;硬编码&#34;嵌套的ifelse方法很容易变得不方便。

因素案例1:是,否

ifelse

或者,

# create some data
loan_status <- c("Fully Paid", "Charged Off", "Something", "Else")
# do the conversion
factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("Yes", "No"))
#[1] Yes  No   <NA> <NA>
#Levels: Yes No

如果预期结果是字符。

因子情况2:0L,1L作为整数

如果预期结果为整数类型,则仍然可以使用因子方法,但需要进行额外的转换。

as.character(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("Yes", "No")))
#[1] "Yes" "No"  NA    NA  

请注意,此处必须转换为字符。否则,结果将返回因子级别的数字:

as.integer(as.character(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("0", "1"))))
#[1]  0  1 NA NA

加入

如果数据较大且许多项目需要使用as.integer(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("0", "1"))) #[1] 1 2 NA NA 替换,则可能需要考虑加入:

data.table

默认情况下(library(data.table) # create translation table translation_map <- data.table( loan_status = c("Fully Paid", "Charged Off"), target = c(0L, 1L)) # create some user data DT <- data.table(id = LETTERS[1:4], loan_status = c("Fully Paid", "Charged Off", "Something", "Else")) DT # id loan_status #1: A Fully Paid #2: B Charged Off #3: C Something #4: D Else # right join translation_map[DT, on = "loan_status"] # loan_status target id #1: Fully Paid 0 A #2: Charged Off 1 B #3: Something NA C #4: Else NA D ),nomatch = NA执行右连接,即获取data.table的所有行。