我想知道是否有比我正在做的事情更简单的方法来创建这些变量?我正在尝试将车辆类型变量的值转换为变量本身。
##Set up boolean values##
norm.knnN$gearbox[norm.knnN$gearbox=="automatic"] = 1
norm.knnN$gearbox[norm.knnN$gearbox=="manual"] = 0
norm.knnN$gearbox = as.numeric(norm.knnN$gearbox)
norm.knnN$bus= ifelse(norm.knnN$vehicleType=="bus",1,0)
norm.knnN$cabrio= ifelse(norm.knnN$vehicleType=="cabrio",1,0)
norm.knnN$coupe= ifelse(norm.knnN$vehicleType=="coupe",1,0)
norm.knnN$limousine= ifelse(norm.knnN$vehicleType=="limousine",1,0)
norm.knnN$otherCar= ifelse(norm.knnN$vehicleType=="other",1,0)
norm.knnN$small_car= ifelse(norm.knnN$vehicleType=="small_car",1,0)
norm.knnN$station_wagon= ifelse(norm.knnN$vehicleType=="station_wagon",1,0)
norm.knnN$suv= ifelse(norm.knnN$vehicleType=="suv",1,0)
norm.knnN$vehicleType = NULL
norm.knnN$cng= ifelse(norm.knnN$fuelType=="cng",1,0)
norm.knnN$diesel= ifelse(norm.knnN$fuelType=="diesel",1,0)
norm.knnN$electric= ifelse(norm.knnN$fuelType=="electric",1,0)
norm.knnN$hybrid= ifelse(norm.knnN$fuelType=="hybrid",1,0)
norm.knnN$lpg= ifelse(norm.knnN$fuelType=="lpg",1,0)
norm.knnN$otherFuel= ifelse(norm.knnN$fuelType=="other",1,0)
norm.knnN$petrol= ifelse(norm.knnN$fuelType=="petrol",1,0)
norm.knnN$fuelType = NULL
norm.knnN$audi= ifelse(norm.knnN$brand=="audi",1,0)
norm.knnN$bmw= ifelse(norm.knnN$brand=="bmw",1,0)
norm.knnN$mercedes_benz= ifelse(norm.knnN$brand=="mercedes_benz",1,0)
norm.knnN$opel= ifelse(norm.knnN$brand=="opel",1,0)
norm.knnN$volkswagen= ifelse(norm.knnN$brand=="volkswagen",1,0)
norm.knnN$brand = NULL
norm.knnN$notRepairedDamage[norm.knnN$notRepairedDamage=="yes"] = 1
norm.knnN$notRepairedDamage[norm.knnN$notRepairedDamage=="no"] = 0
norm.knnN$notRepairedDamage = as.numeric(norm.knnN$notRepairedDamage)
答案 0 :(得分:2)
cobalt
程序包中有一个名为splitfactor()
的辅助函数,该函数将因子分解为伪变量。您将运行以下命令:
norm.knnN <- cobalt::splitfactor(norm.knnN,
c("gearbox", "vehicleType",
"fuelType", "brand", "notRepairedDamage"),
drop.first = "if2")
设置drop.first = "if2"
使得如果一个因子具有两个值(例如"yes"
和"no"
),则将删除其中一个虚拟变量,因为它对另一个。
答案 1 :(得分:0)
最好使用dplyr的语法。有一些文献可以学习如何使用语法。 (https://genomicsclass.github.io/book/pages/dplyr_tutorial.html)
在语法上,我通常使用case_when函数创建多个条件,而不是使用ifelse函数。
norm.knnN$gearbox[norm.knnN$gearbox=="automatic"] = 1
norm.knnN$gearbox[norm.knnN$gearbox=="manual"] = 0
它将更改为我提到的这种语法。
norm.knnN %>%
mutate(gearbox = case_when(
gearbox == "automatic" ~ 1,
gearbox == "manual" ~ 0,
TRUE ~ NA))