我是R的新手,需要知道如何将变量“性别”保持在其常数均值,以便在对医生就诊数据使用泊松回归分析后进行预测。 这是我的数据样本:
visits gender illness age.category
1 female 1 <30
1 female 1 <30
1 male 3 <30
1 male 1 <30
1 male 2 <30
1 female 5 <30
1 female 4 <30
1 female 3 <30
1 female 2 <30
1 male 1 <30
我已经获得了如何预测访问率的例子(见下文) 男性和女性两周的医生(同时以不变的方式控制年龄和疾病)。
sex <- factor(c('female', 'male',))
avg.age <- mean(DoctorVisits$age)
avg.illness <- mean(DoctorVisits$illness)
hypothetical.person <- expand.grid(age=avg.age,
gender=sex,
illness=avg.illness)
predict(M.dr,
newdata = hypothetical.person,
type = 'response')
但我需要预测两个星期内医生的就诊率(同时以不变的方式控制性和疾病)。然而,我不知道如何将性别保持在恒定的意义上。我如何确保这一点?
答案 0 :(得分:0)
以下是我根据男性和女性及其平均病情为所有不同illness
级别创建data.frame的方法。
xy <- read.table(text = "visits gender illness age.category
1 female 1 <30
1 female 1 <30
1 male 3 <30
1 male 1 <30
1 male 2 <30
1 female 5 <30
1 female 4 <30
1 female 3 <30
1 female 2 <30
1 male 1 <30", header = TRUE)
xy
sex <- factor(c('female', 'male'))
age.groups <- c("< 30", "30-50", "> 50")
avg.illness.by.gender <- aggregate(illness ~ gender, data = xy, FUN = mean)
out <- expand.grid(gender = sex, age = age.groups)
out[out$gender == "female", "illness"] <- avg.illness.by.gender[avg.illness.by.gender$gender == "female", "illness"]
out[out$gender == "male", "illness"] <- avg.illness.by.gender[avg.illness.by.gender$gender == "male", "illness"]
out
gender age illness
1 female < 30 2.666667
2 male < 30 1.750000
3 female 30-50 2.666667
4 male 30-50 1.750000
5 female > 50 2.666667
6 male > 50 1.750000