Question

我想创建一个多项式特征（GarageGrade），该特征通过乘法将车库质量（GarageQual）与车库条件（GarageCond）相结合。 GarageQual和GarageCond的值以字符形式给出：Po（差），Fa（一般），TA（典型），Gd（好），Ex（优秀）。

str(combi$GarageQual)

返回：chr [1：2919]“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ Fa”“ Gd”“ TA” ...

str(combi$GarageCond)

返回：chr [1：2919]“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ TA”“ TA” ...

首先，我将它们分解为因数：

combi$GarageQual <- factor(combi$GarageQual)
str(combi$GarageQual)

返回：具有5个级别的因子“ Ex”，“ Fa”，“ Gd”，..：5 5 5 5 5 5 5 5 2 3 ..

combi$GarageCond <- factor(combi$GarageCond)
str(combi$GarageCond)

返回：>带有5个级别的因子“ Ex”，“ Fa”，“ Gd”，..：5 5 5 5 5 5 5 5 5 ...

现在我想替换因子级别名称的向量

c("NA", "Po", "Fa", "TA", "Gd", "Ex")

带有数字矢量

c(0, 1, 2, 3, 4, 5)

因此可以将这些变量相乘以创建组合功能，如下所示：

combi$GarageGrade <- combi$GarageQual * combi$GarageCond

实现我的最终目标（将GarageQual和GarageCond相结合）的全面GarageGradevariable的最佳方法是什么？我应该算一算开始的级别还是应该直接用数字替换字符？如果是这样，我该怎么做？

Answer 1

直接的方法是按照正确的顺序创建五个评级的向量，然后使用match将评级转换为数字。

set.seed(22)
grades <- c( "Po", "Fa", "TA", "Gd", "Ex")
GarageQual <- sample(grades, 20, replace = TRUE)
GarageCond <- sample(grades, 20, replace = TRUE)

match(GarageQual, grades) * match(GarageCond, grades)

[1]  4  6 15 12 20 20 12 20  6  4  5  8 15  5 15  1 15  1  4  6

只要指定了因子水平，使其顺序正确，便可以使用与上面概述的方法类似的方法（先转换为因子，然后转换为数字）。

as.numeric(factor(GarageQual, levels = grades)) * as.numeric(factor(GarageCond, levels = grades))

[1]  4  6 15 12 20 20 12 20  6  4  5  8 15  5 15  1 15  1  4  6

如何从非数字变量创建多项式特征？

1 个答案: