我的就业长度数据看起来像这样。
chr [1:39786] "10+ years" "< 1 year" "10+ years" "10+ years" "1 year" "3 years" "8 years" "9 years" "4 years"
目前我正在使用此脚本将其转换为数字数据。太可怕了。
newdata$emp_length <- gsub(" years" , "", newdata$emp_length)
newdata$emp_length <- gsub(" year" , "", newdata$emp_length)
newdata$emp_length <-gsub("n/a",0,newdata$emp_length)
newdata$emp_length <-gsub("< 1",0.5,newdata$emp_length)
set.seed(1)
temp1<- c(1:length(newdata$emp_length[which(newdata$emp_length == '10+')]))
for (w in c(0.18,0.17,0.15,0.13,0.10,0.09,0.07,0.05,0.03)){
assign(paste0("sample", w), sample(temp1,8899*w, replace = FALSE))
temp1 <- temp1[!(temp1 %in% get(paste0("sample", w)))]
}
temp2 <- newdata$emp_length[newdata$emp_length == "10+"]
temp2[sample0.18] <- 10
temp2[sample0.17] <- 11
temp2[sample0.15] <- 12
temp2[sample0.13] <- 13
temp2[sample0.1] <- 14
temp2[sample0.09] <- 15
temp2[sample0.07] <- 16
temp2[sample0.05] <- 17
temp2[sample0.03] <- 18
temp2[temp1] <- 19
newdata$emp_length[newdata$emp_length == "10+"] <- temp2
newdata$emp_length <- as.numeric(newdata$emp_length)
循环创建9个随机样本。然后,我随机地&#39;分配从10到19的数字。
有更好的方法吗? 将10分配给所有&#34; 10 +&#34;是否更好?值?
数据来自:(https://www.lendingclub.com/info/download-data.action)