按数据框中包含的单个变量的四分位数对数据帧进行分组

时间:2012-03-26 11:22:35

标签: r aggregate

我有R提供的瑞士数据集,其格式如下:

            Fertility Agriculture Examination Education Catholic Infant.Mortality
Courtelary       80.2        17.0          15        12     9.96             22.2
Delemont         83.1        45.1           6         9    84.84             22.2
Franches-Mnt     92.5        39.7           5         5    93.40             20.2
    .              .           .            .         .      .                 . 
    .              .           .            .         .      .                 . 
    .              .           .            .         .      .                 . 

V. De Geneve     35.0         1.2          37        53    42.34             18.0
Rive Droite      44.7        46.6          16        29    50.43             18.2
Rive Gauche      42.8        27.7          22        29    58.33             19.3

我想知道是否有一种简单或简单的方法可以将数据分为四组,一组用于 Education 变量的每个四分位数,然后获得相应的 Infant。每个省的死亡率,所以我可以得到类似的东西:

       Group1stQ           Group1stQ           Group1stQ          Group1stQ 

   <Mortality for      <Mortality for       <Mortality for     <Mortality for
     1st province        1st province         1st province       1st province
     on this group>     on this group>       on this group>     on this group>

   <Mortality for      <Mortality for       <Mortality for     <Mortality for
     2nd province        2nd province         2nd province       2nd province
     on this group>     on this group>       on this group>     on this group>

   <Mortality for      <Mortality for       <Mortality for     <Mortality for
     3rd province        3rd province         3rd province       3rd province
     on this group>     on this group>       on this group>     on this group>
          .                  .                    .                  .
          .                  .                    .                  .
          .                  .                    .                  .

先谢谢你的帮助!!!

1 个答案:

答案 0 :(得分:4)

怎么样:

> swiss$qEdu <- cut (swiss$Education, 
                     breaks = quantile (swiss$Education, c (0, .25, .5, .75, 1)), 
                     include.lowest = TRUE)

> aggregate (swiss$Infant.Mortality, list (qEdu = swiss$qEdu), FUN = mean)
     qEdu        x
1   [1,6] 19.31429
2   (6,8] 21.93636
3  (8,12] 19.38182
4 (12,53] 19.30909

(我真的不知道你的数字是多少 - 它们与我得到的平均值不一致)

(那是在编辑之前......)

(第2次编辑后:) 如果您希望每个省的Infant.Mortality都属于Eduction的四分位数,请使用list ()作为聚合函数:

>  aggregate (swiss$Infant.Mortality, list (qEdu = swiss$qEdu), FUN = list)
     qEdu                                                                                  x
1   [1,6] 20.2, 24.5, 18.7, 21.2, 22.4, 15.3, 21.0, 18.0, 15.1, 19.8, 18.3, 19.4, 20.2, 16.3
2   (6,8]                   20.3, 26.6, 23.6, 24.9, 21.0, 19.1, 20.0, 23.8, 22.5, 20.0, 19.5
3  (8,12]                   22.2, 22.2, 16.5, 22.7, 20.0, 18.0, 16.7, 16.3, 17.8, 20.3, 20.5
4 (12,53]                   20.6, 24.4, 20.2, 10.8, 20.9, 18.1, 18.9, 23.0, 18.0, 18.2, 19.3

或:

> Infant.Mortality <- lapply (levels (swiss$qEdu), function (x) swiss$Infant.Mortality [swiss$qEdu == x])
> names (Infant.Mortality) <- levels (swiss$qEdu)
> Infant.Mortality
$`[1,6]`
 [1] 20.2 24.5 18.7 21.2 22.4 15.3 21.0 18.0 15.1 19.8 18.3 19.4 20.2 16.3

$`(6,8]`
 [1] 20.3 26.6 23.6 24.9 21.0 19.1 20.0 23.8 22.5 20.0 19.5

$`(8,12]`
 [1] 22.2 22.2 16.5 22.7 20.0 18.0 16.7 16.3 17.8 20.3 20.5

$`(12,53]`
 [1] 20.6 24.4 20.2 10.8 20.9 18.1 18.9 23.0 18.0 18.2 19.3