在dplyr summary()之后使用dcast()函数

时间:2019-02-13 05:30:29

标签: r dplyr tidyr

我有来自四个不同射箭项目的大约4,000分的数据集。数据集中有两种不同的设备类别:复合和反曲线。我需要显示一些按“事件”分组的摘要统计信息,但应按“类”在表格中分散显示。

这里有一些示例数据:

> results
# A tibble: 4,478 x 8
    Year Event                 Class    Division Gender Organization Setting Score
   <dbl> <chr>                 <chr>    <chr>    <chr>  <chr>        <chr>   <dbl>
 1  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    711
 2  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    708
 3  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    708
 4  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    702
 5  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    700
 6  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    700
 7  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    699
 8  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    696
 9  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    694
10  2016 NFAA Indoor Nationals Compound Amateur  F      NFAA         Indoor    690
# … with 4,468 more rows

我正在使用以下代码在这四个事件中为每个设备类别分别生成第10、50和90%。

percentile_summaries <- results %>%
  select(Event, Class, Score) %>%
  group_by(Event, Class) %>%
  summarize(p10=quantile(Score, c(.10)),
            p50=median(Score),
            p90=quantile(Score, c(.90))
            )

该代码产生以下输出:

> percentile_summaries
# A tibble: 8 x 5
# Groups:   Event [?]
  Event                         Class      p10   p50   p90
  <chr>                         <chr>    <dbl> <dbl> <dbl>
1 NFAA Field Nationals          Compound  504.  538   555 
2 NFAA Field Nationals          Recurve   398.  463   496.
3 NFAA Indoor Nationals         Compound  656   704   718 
4 NFAA Indoor Nationals         Recurve   464.  554.  626 
5 USA Archery Indoor Nationals  Compound 1026. 1116  1166 
6 USA Archery Indoor Nationals  Recurve   706   959  1105 
7 USA Archery Outdoor Nationals Compound 1148. 1328. 1398 
8 USA Archery Outdoor Nationals Recurve   860. 1096  1252.

现在,我想散布这些百分位数,以便在事件名称的行中连续包含三个百分位数的化合物和三个百分位数的曲线。最终,我将生成一个(大致)如下所示的HTML表:

                              Compound                 Recurve
                         p10     p50     p90     p10     p50     p90
NFAA Field Nationals     504     538     555     398     463     496
NFAA Indoor Nationals    656     704     718     464     554     626
etc.

到目前为止,散布数据的最后一步使我难以理解。有什么建议么?谢谢。

1 个答案:

答案 0 :(得分:0)

以下是一种可能的解决方案。从percentile_summaries小标题开始,您可以使用包data.table来将函数dcastvalue.var参数中的多列一起使用,即

library(data.table)
df <- dcast(setDT(percentile_summaries), Event ~ Class, value.var = c("p10", "p50", "p90"))

输出:

                        Event p10_Compound p10_Recurve p50_Compound p50_Recurve p90_Compound p90_Recurve
         NFAA_Field_Nationals          504         398          538         463          555         496
        NFAA_Indoor_Nationals          656         464          704         554          718         626
 USA_Archery_Indoor_Nationals         1026         706         1116         959         1166        1105
USA_Archery_Outdoor_Nationals         1148         860         1328        1096         1398        1252

此处列的顺序不是所需的顺序,因为“化合物”和“反曲”是交替的。要订购它们,只需使用

df[,c(1,2,4,6,3,5,7)]

输出:

                        Event p10_Compound p50_Compound p90_Compound p10_Recurve p50_Recurve p90_Recurve
         NFAA_Field_Nationals          504          538          555         398         463         496
        NFAA_Indoor_Nationals          656          704          718         464         554         626
 USA_Archery_Indoor_Nationals         1026         1116         1166         706         959        1105
USA_Archery_Outdoor_Nationals         1148         1328         1398         860        1096        1252

然后,您可以继续所需的HTML表。