R的标准均值差

时间:2018-10-12 23:34:51

标签: r mean multinomial standardized

我有一个如下数据框:

region        group        mid_pop   
  1             2            1146      
  2             4            1682        
  3             3            2891       
  4             1            7654       
  5             1            3289       
  6             2            1128       
  7             3            2121       
  8             4            3217      
  9             3            1616       
 10             1            1717              

我进行了多项式回归,得出属于每个组的概率如下:

 mlogit <- multinom(group ~ mid_pop)
 probs <- predict(mlogit, type="probs")  

   probs1    probs2   probs3    probs4     
     0.2       0.3     0.4       0.1        
     0.3       0.4     0.15      0.15       
     0.4       0.1     0.3       0.2        
     0.7       0.1     0.1       0.1        
     0.2       0.3     0.4       0.1        
     0.6       0.1     0.1       0.2        
     0.7       0.1     0.1       0.1        
     0.3       0.2     0.1       0.4        
     0.2       0.1     0.1       0.6        
     0.1       0.2     0.1       0.6   

然后我为每个区域创建一个权重。权重是“属于第一组的概率除以属于该区域所在当前组的概率”。然后将权重乘以mid_pop。

region        group        mid_pop    weight      mid_pop(weighted)
  1             2            1146      0.66           756.36
  2             4            1682       2              3364
  3             3            2891       2              5782
  4             1            7654       0.7            5357.8
  5             1            3289       0.2            657.8
  6             2            1128       0.3            338.4
  7             3            2121       0.7            1484.7
  8             4            3217       0.75           2412.75
  9             3            1616       0.33           533.28
 10             1            1717       0.16           274.72     

现在,我想对各组进行标准化的均值差异,并查看加权前后mid_pop均值之间的差异。 结果将是这样的:

SDM (group 1 vs. group 2)=....
SDM (group 1 vs. group 3)=....
SDM (group 1 vs. group 4)= ....

任何人都可以帮助我们做到这一点吗? 预先感谢。

1 个答案:

答案 0 :(得分:1)

使用group_by库中的tidyverse

library(tidyverse)

df <-
  tibble(
    region = 1:10,
    group = c(2, 4, 3, 1, 1, 2, 3, 4, 3, 1),
    mid_pop = c(1146, 1682, 2891, 7654, 3289, 1128, 2121, 3217, 1616, 1717)
  ) # your data set

weight <- c(.66, 2, 2, .7, .2, .3, .7, .75, .33, .16)

df_wt <-
  df %>%
  bind_cols(weight = weight) %>%
  mutate(weighted = mid_pop * weight) %>% # your second data set: mid_pop(weighted)
  group_by(group) %>%
  summarise(pop = mean(weighted)) # average

## > df_wt
## # A tibble: 4 x 2
##   group   pop
##   <dbl> <dbl>
## 1     1 2097.
## 2     2  547.
## 3     3 2600.
## 4     4 2888.

outer函数与"-"操作可能会产生成对差异

wt_pop <- df_wt %>% select(pop) %>% pull()

outer(wt_pop, wt_pop, "-") # symmetric matrix for the answer

##            [,1]     [,2]       [,3]       [,4]
## [1,]     0.0000 1549.393  -503.2200  -791.6017
## [2,] -1549.3933    0.000 -2052.6133 -2340.9950
## [3,]   503.2200 2052.613     0.0000  -288.3817
## [4,]   791.6017 2340.995   288.3817     0.0000

或者,您可以连续应用outer
您需要使用as.data.frame()

这样的功能将其更改为数据框
df %>%
  bind_cols(weight = weight) %>%
  mutate(weighted = mid_pop * weight) %>%
  group_by(group) %>%
  summarise(pop = mean(weighted)) %>%
  do(outer(.$pop, .$pop, "-") %>% as_tibble())

## # A tibble: 4 x 4
##       V1    V2     V3     V4
##    <dbl> <dbl>  <dbl>  <dbl>
## 1     0  1549.  -503.  -792.
## 2 -1549.    0  -2053. -2341.
## 3   503. 2053.     0   -288.
## 4   792. 2341.   288.     0