我有来自四个不同射箭项目的大约4,000分的数据集。数据集中有两种不同的设备类别:复合和反曲线。我需要显示一些按“事件”分组的摘要统计信息,但应按“类”在表格中分散显示。
这里有一些示例数据:
> results
# A tibble: 4,478 x 8
Year Event Class Division Gender Organization Setting Score
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 711
2 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 708
3 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 708
4 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 702
5 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 700
6 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 700
7 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 699
8 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 696
9 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 694
10 2016 NFAA Indoor Nationals Compound Amateur F NFAA Indoor 690
# … with 4,468 more rows
我正在使用以下代码在这四个事件中为每个设备类别分别生成第10、50和90%。
percentile_summaries <- results %>%
select(Event, Class, Score) %>%
group_by(Event, Class) %>%
summarize(p10=quantile(Score, c(.10)),
p50=median(Score),
p90=quantile(Score, c(.90))
)
该代码产生以下输出:
> percentile_summaries
# A tibble: 8 x 5
# Groups: Event [?]
Event Class p10 p50 p90
<chr> <chr> <dbl> <dbl> <dbl>
1 NFAA Field Nationals Compound 504. 538 555
2 NFAA Field Nationals Recurve 398. 463 496.
3 NFAA Indoor Nationals Compound 656 704 718
4 NFAA Indoor Nationals Recurve 464. 554. 626
5 USA Archery Indoor Nationals Compound 1026. 1116 1166
6 USA Archery Indoor Nationals Recurve 706 959 1105
7 USA Archery Outdoor Nationals Compound 1148. 1328. 1398
8 USA Archery Outdoor Nationals Recurve 860. 1096 1252.
现在,我想散布这些百分位数,以便在事件名称的行中连续包含三个百分位数的化合物和三个百分位数的曲线。最终,我将生成一个(大致)如下所示的HTML表:
Compound Recurve
p10 p50 p90 p10 p50 p90
NFAA Field Nationals 504 538 555 398 463 496
NFAA Indoor Nationals 656 704 718 464 554 626
etc.
到目前为止,散布数据的最后一步使我难以理解。有什么建议么?谢谢。
答案 0 :(得分:0)
以下是一种可能的解决方案。从percentile_summaries
小标题开始,您可以使用包data.table
来将函数dcast
与value.var
参数中的多列一起使用,即
library(data.table)
df <- dcast(setDT(percentile_summaries), Event ~ Class, value.var = c("p10", "p50", "p90"))
输出:
Event p10_Compound p10_Recurve p50_Compound p50_Recurve p90_Compound p90_Recurve
NFAA_Field_Nationals 504 398 538 463 555 496
NFAA_Indoor_Nationals 656 464 704 554 718 626
USA_Archery_Indoor_Nationals 1026 706 1116 959 1166 1105
USA_Archery_Outdoor_Nationals 1148 860 1328 1096 1398 1252
此处列的顺序不是所需的顺序,因为“化合物”和“反曲”是交替的。要订购它们,只需使用
df[,c(1,2,4,6,3,5,7)]
输出:
Event p10_Compound p50_Compound p90_Compound p10_Recurve p50_Recurve p90_Recurve
NFAA_Field_Nationals 504 538 555 398 463 496
NFAA_Indoor_Nationals 656 704 718 464 554 626
USA_Archery_Indoor_Nationals 1026 1116 1166 706 959 1105
USA_Archery_Outdoor_Nationals 1148 1328 1398 860 1096 1252
然后,您可以继续所需的HTML表。