我是R语言的新手,对于我的任务,我正在尝试为不同的变量生成3个级别的虚拟变量(总计3个)。但是,每种方法我都遇到问题:
方法1:后跟https://stats.idre.ucla.edu/r/modules/coding-for-categorical-variables-in-regression-models/ 代码:
> housing_prices2$Fuel.Type.f <- factor(housing_prices2$Fuel.Type)
> is.factor(housing_prices2$Fuel.Type.f)
[1] TRUE
> housing_prices2$Fuel.Type.f[1:10]
[1] Electric Gas Gas Gas Gas Gas Oil
[8] Oil Electric Gas
Levels: Electric Gas None Oil Solar Unknown/Other Wood
效果很好。但是,当我在下一行遇到问题时:
> summary(lm(write ~ Fuel.Type.f, data = housing_prices2))
Error in model.frame.default(formula = write ~ Fuel.Type.f, data = housing_prices2,: object is not a matrix
我只是不知道这个错误,对我来说这没有意义,所以我决定使用另一种方法;
方法2:后跟Convert categorical variables to numeric in R
对于变量Fuel.Type,效果很好:
> Fuel.Type <- as.factor(c("Electric", "Gas", "None", "Oil", "Solar", "Unknown/Other",
+ "Wood"))
> Fuel.Type
[1] Electric Gas None Oil Solar
[6] Unknown/Other Wood
Levels: Electric Gas None Oil Solar Unknown/Other Wood
> unclass(Fuel.Type)
[1] 1 2 3 4 5 6 7
attr(,"levels")
[1] "Electric" "Gas" "None" "Oil"
[5] "Solar" "Unknown/Other" "Wood"
但是当我尝试为其他变量生成虚拟变量时,出现了此错误:
> housing_prices2$Heat.Type.f[1:10]
NULL
Warning message:
Unknown or uninitialised column: 'Heat.Type.f'.
我也不知道这些错误是怎么回事... 任何建议表示赞赏!
顺便说一句,这是我的示例数据表:
>$ Fuel.Type : chr "Electric" "Gas" "Gas" "Gas"
>$ Heat.Type : chr "Electric" "Hot Water" "Hot Water" "Hot Air"
>$ Sewer.Type : chr "Private" "Private" "Public" "Private"
答案 0 :(得分:0)
昨晚我发现了我的问题。 问题是由于我创建了一个名为:
的新数据文件,因此弄乱了数据文件。hp2 <- read_excel("Desktop/hw/424/hw1/housing_prices2.xlsx")
此外,我也弄乱了Y变量,请参见
summary(lm(write ~ Fuel.Type.f, data = housing_prices2))
我的Y变量实际上没有写入。