我有一个与此相似的数据框:
BMI<-c(13.4,14,15.6,16,13.4,12.9,17.7,18.3,17,16.5)
sport<-c(1,2,2,3,2,1,1,3,1,2)
social<-c("low","middle","middle","low","high","low","middle","middle","high","middle")
smoker<-c(1,0,0,1,2,3,2,2,2,1)
status<-c("low","high","low","middle","low","middle","middle","middle","high","low")
social<-as.factor(social)
status<-as.factor(status)
sport<-as.integer(sport)
smoker<-as.integer(smoker)
df<-data.frame(BMI,sport,social,status,smoker)
我想对变量“ BMI” 执行多元线性回归,但是我不知道如何处理分类变量,或者说总体上是不同的格式。
我将如何转换这些变量才能获得有意义的结果?
答案 0 :(得分:1)
您需要使用广义线性模型,并使用factor
来设置分类变量,例如:
glm(data=iris,formula=Sepal.Width~Sepal.Length+Petal.Length+factor(Species))
使用数据:
glm(data=df,BMI~sport+social+status+smoker,family="gaussian")
如果要使用线性模型:
library(tidyverse)
df1<-df %>%
mutate_if(is.character,as.factor)
lm(BMI~sport+social+status+smoker,data=df1)