Question

我有R提供的瑞士数据集，其格式如下：

            Fertility Agriculture Examination Education Catholic Infant.Mortality
Courtelary       80.2        17.0          15        12     9.96             22.2
Delemont         83.1        45.1           6         9    84.84             22.2
Franches-Mnt     92.5        39.7           5         5    93.40             20.2
    .              .           .            .         .      .                 . 
    .              .           .            .         .      .                 . 
    .              .           .            .         .      .                 . 

V. De Geneve     35.0         1.2          37        53    42.34             18.0
Rive Droite      44.7        46.6          16        29    50.43             18.2
Rive Gauche      42.8        27.7          22        29    58.33             19.3

我想知道是否有一种简单或简单的方法可以将数据分为四组，一组用于 Education 变量的每个四分位数，然后获得相应的 Infant。每个省的死亡率，所以我可以得到类似的东西：

       Group1stQ           Group1stQ           Group1stQ          Group1stQ 

   <Mortality for      <Mortality for       <Mortality for     <Mortality for
     1st province        1st province         1st province       1st province
     on this group>     on this group>       on this group>     on this group>

   <Mortality for      <Mortality for       <Mortality for     <Mortality for
     2nd province        2nd province         2nd province       2nd province
     on this group>     on this group>       on this group>     on this group>

   <Mortality for      <Mortality for       <Mortality for     <Mortality for
     3rd province        3rd province         3rd province       3rd province
     on this group>     on this group>       on this group>     on this group>
          .                  .                    .                  .
          .                  .                    .                  .
          .                  .                    .                  .

先谢谢你的帮助!!!

Answer 1

怎么样：

> swiss$qEdu <- cut (swiss$Education, 
                     breaks = quantile (swiss$Education, c (0, .25, .5, .75, 1)), 
                     include.lowest = TRUE)

> aggregate (swiss$Infant.Mortality, list (qEdu = swiss$qEdu), FUN = mean)
     qEdu        x
1   [1,6] 19.31429
2   (6,8] 21.93636
3  (8,12] 19.38182
4 (12,53] 19.30909

（我真的不知道你的数字是多少 - 它们与我得到的平均值不一致）

（那是在编辑之前......）

（第2次编辑后:) 如果您希望每个省的Infant.Mortality都属于Eduction的四分位数，请使用list ()作为聚合函数：

>  aggregate (swiss$Infant.Mortality, list (qEdu = swiss$qEdu), FUN = list)
     qEdu                                                                                  x
1   [1,6] 20.2, 24.5, 18.7, 21.2, 22.4, 15.3, 21.0, 18.0, 15.1, 19.8, 18.3, 19.4, 20.2, 16.3
2   (6,8]                   20.3, 26.6, 23.6, 24.9, 21.0, 19.1, 20.0, 23.8, 22.5, 20.0, 19.5
3  (8,12]                   22.2, 22.2, 16.5, 22.7, 20.0, 18.0, 16.7, 16.3, 17.8, 20.3, 20.5
4 (12,53]                   20.6, 24.4, 20.2, 10.8, 20.9, 18.1, 18.9, 23.0, 18.0, 18.2, 19.3

或：

> Infant.Mortality <- lapply (levels (swiss$qEdu), function (x) swiss$Infant.Mortality [swiss$qEdu == x])
> names (Infant.Mortality) <- levels (swiss$qEdu)
> Infant.Mortality
$`[1,6]`
 [1] 20.2 24.5 18.7 21.2 22.4 15.3 21.0 18.0 15.1 19.8 18.3 19.4 20.2 16.3

$`(6,8]`
 [1] 20.3 26.6 23.6 24.9 21.0 19.1 20.0 23.8 22.5 20.0 19.5

$`(8,12]`
 [1] 22.2 22.2 16.5 22.7 20.0 18.0 16.7 16.3 17.8 20.3 20.5

$`(12,53]`
 [1] 20.6 24.4 20.2 10.8 20.9 18.1 18.9 23.0 18.0 18.2 19.3

按数据框中包含的单个变量的四分位数对数据帧进行分组

1 个答案: