说我在数据帧scores
中有如下测试结果:
name firstname score
1 McKay Rodney 4
2 McKay Rodney 2
3 McKay Rodney 5
4 Weir Elizabeth 1
5 Weir Elizabeth 8
我想为每个人计算分数分布的分位数。如果我只想要一个固定的分位数(例如中位数),则可以执行以下操作:
quantile_df <- score_df %>%
group_by(name, firstname) %>%
summarize(q50 = median(score))
结果数据帧将具有列name
,firstname
和q50
。如果我要计算任意数量的分位数,则无法缩放。假设我要三个(暂时),结果将是以下(数字是无意义的):
name firstname q quantiles
1 McKay Rodney 0.25 1
2 McKay Rodney 0.50 3
3 McKay Rodney 0.75 7
4 Weir Elizabeth 0.25 2
5 Weir Elizabeth 0.50 4
6 Weir Elizabeth 0.75 6
感觉dplyr
应该是找到与此相关的东西的正确软件包,但我没有。相反,我将实现以下内容:
mapply
,该数据帧的行包含name
和firstname
。然后,此函数将过滤scores
,以使名称和名字匹配并提取分数。该函数返回包含name
,firstname
,q
和quantiles
的数据帧。scores
数据框进行联接,以将这些列放入(如果有的话)。这样的功能是否存在于普通的R库中?
答案 0 :(得分:3)
您可以将值存储到列表中,并使用unnest()
方法展开,即
library(tidyverse)
df %>%
group_by(name, firstname) %>%
summarise(new = list(quantile(score))) %>%
unnest()
给出,
# A tibble: 10 x 3 # Groups: name [2] name firstname new <fct> <fct> <dbl> 1 McKay Rodney 2.00 2 McKay Rodney 3.00 3 McKay Rodney 4.00 4 McKay Rodney 4.50 5 McKay Rodney 5.00 6 Weir Elizabeth 1.00 7 Weir Elizabeth 2.75 8 Weir Elizabeth 4.50 9 Weir Elizabeth 6.25 10 Weir Elizabeth 8.00
答案 1 :(得分:0)
一个data.table
答案:
score_df <- data.frame(name = sample(c('Bob', 'Sue', 'Lorna'), 100, T)
, score = sample(1:100))
library(data.table)
setDT(score_df)
score_df[, quantile(score), name]
# name V1
# 1: Bob 1.00
# 2: Bob 20.00
# 3: Bob 41.00
# 4: Bob 82.00
# 5: Bob 99.00
# 6: Lorna 2.00
# 7: Lorna 23.00
# 8: Lorna 52.00
# 9: Lorna 77.00
# 10: Lorna 100.00
# 11: Sue 7.00
# 12: Sue 33.75
# 13: Sue 50.00
# 14: Sue 64.50
# 15: Sue 94.00
或者,如果您想包含百分比
score_df[, {qu <- quantile(score)
.(q = names(qu), quantiles = qu)}
, name]
# name q quantiles
# 1: Bob 0% 1.00
# 2: Bob 25% 20.00
# 3: Bob 50% 41.00
# 4: Bob 75% 82.00
# 5: Bob 100% 99.00
# 6: Lorna 0% 2.00
# 7: Lorna 25% 23.00
# 8: Lorna 50% 52.00
# 9: Lorna 75% 77.00
# 10: Lorna 100% 100.00
# 11: Sue 0% 7.00
# 12: Sue 25% 33.75
# 13: Sue 50% 50.00
# 14: Sue 75% 64.50
# 15: Sue 100% 94.00
要对data.table
中的2列进行分组,您可以进行例如
score_df[, quantile(score), .(name, firstname)]
如果您恰好同时加载了tibble
或tidyverse
,也可以
library(tidyverse)
score_df[, enframe(quantile(score), 'q')
, name]
# name q value
# 1: Lorna 0% 9.0
# 2: Lorna 25% 35.0
# 3: Lorna 50% 65.5
# 4: Lorna 75% 85.0
# 5: Lorna 100% 97.0
# 6: Bob 0% 7.0
# 7: Bob 25% 24.5
# 8: Bob 50% 48.0
# 9: Bob 75% 65.5
# 10: Bob 100% 100.0
# 11: Sue 0% 1.0
# 12: Sue 25% 19.0
# 13: Sue 50% 40.0
# 14: Sue 75% 67.0
# 15: Sue 100% 98.0