我有一个如下数据框:
region group mid_pop
1 2 1146
2 4 1682
3 3 2891
4 1 7654
5 1 3289
6 2 1128
7 3 2121
8 4 3217
9 3 1616
10 1 1717
我进行了多项式回归,得出属于每个组的概率如下:
mlogit <- multinom(group ~ mid_pop)
probs <- predict(mlogit, type="probs")
probs1 probs2 probs3 probs4
0.2 0.3 0.4 0.1
0.3 0.4 0.15 0.15
0.4 0.1 0.3 0.2
0.7 0.1 0.1 0.1
0.2 0.3 0.4 0.1
0.6 0.1 0.1 0.2
0.7 0.1 0.1 0.1
0.3 0.2 0.1 0.4
0.2 0.1 0.1 0.6
0.1 0.2 0.1 0.6
然后我为每个区域创建一个权重。权重是“属于第一组的概率除以属于该区域所在当前组的概率”。然后将权重乘以mid_pop。
region group mid_pop weight mid_pop(weighted)
1 2 1146 0.66 756.36
2 4 1682 2 3364
3 3 2891 2 5782
4 1 7654 0.7 5357.8
5 1 3289 0.2 657.8
6 2 1128 0.3 338.4
7 3 2121 0.7 1484.7
8 4 3217 0.75 2412.75
9 3 1616 0.33 533.28
10 1 1717 0.16 274.72
现在,我想对各组进行标准化的均值差异,并查看加权前后mid_pop均值之间的差异。 结果将是这样的:
SDM (group 1 vs. group 2)=....
SDM (group 1 vs. group 3)=....
SDM (group 1 vs. group 4)= ....
任何人都可以帮助我们做到这一点吗? 预先感谢。
答案 0 :(得分:1)
使用group_by
库中的tidyverse
library(tidyverse)
df <-
tibble(
region = 1:10,
group = c(2, 4, 3, 1, 1, 2, 3, 4, 3, 1),
mid_pop = c(1146, 1682, 2891, 7654, 3289, 1128, 2121, 3217, 1616, 1717)
) # your data set
weight <- c(.66, 2, 2, .7, .2, .3, .7, .75, .33, .16)
df_wt <-
df %>%
bind_cols(weight = weight) %>%
mutate(weighted = mid_pop * weight) %>% # your second data set: mid_pop(weighted)
group_by(group) %>%
summarise(pop = mean(weighted)) # average
## > df_wt
## # A tibble: 4 x 2
## group pop
## <dbl> <dbl>
## 1 1 2097.
## 2 2 547.
## 3 3 2600.
## 4 4 2888.
outer
函数与"-"
操作可能会产生成对差异
wt_pop <- df_wt %>% select(pop) %>% pull()
outer(wt_pop, wt_pop, "-") # symmetric matrix for the answer
## [,1] [,2] [,3] [,4]
## [1,] 0.0000 1549.393 -503.2200 -791.6017
## [2,] -1549.3933 0.000 -2052.6133 -2340.9950
## [3,] 503.2200 2052.613 0.0000 -288.3817
## [4,] 791.6017 2340.995 288.3817 0.0000
或者,您可以连续应用outer
。
您需要使用as.data.frame()
df %>%
bind_cols(weight = weight) %>%
mutate(weighted = mid_pop * weight) %>%
group_by(group) %>%
summarise(pop = mean(weighted)) %>%
do(outer(.$pop, .$pop, "-") %>% as_tibble())
## # A tibble: 4 x 4
## V1 V2 V3 V4
## <dbl> <dbl> <dbl> <dbl>
## 1 0 1549. -503. -792.
## 2 -1549. 0 -2053. -2341.
## 3 503. 2053. 0 -288.
## 4 792. 2341. 288. 0