我最初有一个未分组数据的数据集,我将其转换为分类的(在“职业”列上),我现在想对我创建的数据使用逻辑回归模型,并用人数表示“成功”在每个职业类别中都死了。
已使用的数据和数据分组
在进行任何分组之前,我的初始数据集如下:
Occupation Education Age Died
1 household Secondary 39 no
2 farming primary 83 yes
3 farming primary 60 yes
4 farming primary 73 yes
5 farming Secondary 51 no
6 farming iliterate 62 yes
然后我使用以下方法对数据进行分组:
occu %>% group_by(Occupation, Died) %>% count()##use this to group on the occupation of the suicide victimrs
这将导致以下输出:
Occupation Died n
<fct> <fct> <int>
1 business/service no 12
2 business/service yes 9
3 farming no 939
4 farming yes 1093
5 household no 154
6 household yes 94
7 others yes 3
8 others/unknown no 146
9 others/unknown yes 10
10 professional no 11
11 professional yes 26
12 retiree no 3
13 student no 27
14 student yes 8
15 unemployed no 23
16 unemployed yes 7
17 worker yes 6
我将以上内容分组为一个表,因此我使用以下值:
dt %>% group_by(Occupation) %>%
mutate(total=sum(n), prop=n/total)
给出输出:
ccupation Died n total prop
<fct> <fct> <int> <int> <dbl>
1 business/service no 12 21 0.571
2 business/service yes 9 21 0.429
3 farming no 939 2032 0.462
4 farming yes 1093 2032 0.538
5 household no 154 248 0.621
6 household yes 94 248 0.379
7 others yes 3 3 1
8 others/unknown no 146 156 0.936
9 others/unknown yes 10 156 0.0641
问题
我的问题是,否,我如何使用原始模型中的所有三个预测变量(教育,年龄,分组职业),对Died = yes是成功,并且对no =不失败,将对这个分组数据运行logistic回归模型