如何在R中将分类变量编码为数字

时间:2015-03-10 10:46:14

标签: r

我正在分析R中的CHFLS数据集,该数据集位于库HSAUR2中。我想为这个数据拟合线性模型,以找出其他变量对变量R_happy的影响;已对R_happy进行编码,使得1表示“非常高兴”,否则为0。我只是想知道我如何编码其余的变量,例如,R_region作为数字,所以我可以使用虚拟变量并拟合线性模型?我尝试过使用as.numeric但它没有用。我的代码如下:

加载必要的库

library("HSAUR2") #Load necessary library
data(CHFLS,package="HSAUR2") #Load the Chinese Health and Family Life Survey data

View(CHFLS) #Read details about the data, including the covariates.
help("CHFLS")

summary(CHFLS) #Produce a summary of the data

#Pie chart showing womens self reported happiness
slices <- c(280, 1254)
lbls <- c("Very happy (18.25%)", "Otherwise(81.75%)")
pie(slices, labels=lbls)

#Define the variable of interest to be y which is 1 when
#"Very happy" (or greater) and 0 otherwise
y<-(CHFLS$R_happy>="Very happy")

# Append y onto the data and call the new data CHFLSnew
CHFLSnew<-cbind(CHFLS,y)

# Ensure that any categorical variables are coded as factors.
CHFLSnew$y<-as.factor(CHFLSnew$y)

##Append y as factor onto CHFLSnew
CHFLSnew<-cbind(CHFLS,y)

1 个答案:

答案 0 :(得分:0)

一般情况下,如果您想将factor转换为numeric

f <- factor(1:10)
f
[1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 2 3 4 5 6 7 8 9 10

n <- as.numeric(levels(f)[f])
n
[1]  1  2  3  4  5  6  7  8  9 10