我的数据框(df)如下所示:
Date Model Color Value Samples
6/19/17 Gold Blue 0.50 500
6/19/17 Gold Red 1.25 449
6/19/17 Silver Blue 0.75 1320
6/19/17 Silver Blue 1.50 103
6/19/17 Gold Red 0.70 891
6/19/17 Gold Blue 0.41 18103
6/19/17 Copper Blue 0.83 564
我可以使用以下内容输出每个Color
变量的百分位数据:
df_subset <- subset(df, df$Color == 'Blue')
quantile(df_subset$Value, c(0.50, 0.99, 0.999, 0.9999))
但是,我该怎么做?
a)将Samples
列添加到输出
b)在输出中添加多行(即Model
变量的每个不同值都有一行)
一个例子如下:
| # Samples | 50th percentile | 99th percentile | 99.9th percentile | 99.99th percentile
Gold
Silver
Copper
提前感谢您的帮助!
答案 0 :(得分:1)
library(data.table)
dat <- data.table(Date = "6/19/17",
Model = c("Gold", "Gold", "Silver", "Silver", "Gold", "Gold", "Copper"),
Color = c("Blue", "Red", "Blue", "Blue", "Red", "Blue", "Blue"),
Value = c(0.5, 1.25, .75, 1.5, .7, .41, .83),
Samples = c(500, 449, 1320, 103, 891, 18103, 564))
dat[, .(Samples = sum(Samples),
`50th percentile` = quantile(Value, probs = c(0.5)),
`99th percentile` = quantile(Value, probs = c(0.99)),
`99.9th percentile` = quantile(Value, probs = c(0.999)),
`99.99th percentile` = quantile(Value, probs = c(0.9999))),
by = Model]
结果:
Model Samples 50th percentile 99th percentile 99.9th percentile 99.99th percentile
1: Gold 19943 0.600 1.2335 1.24835 1.249835
2: Silver 1423 1.125 1.4925 1.49925 1.499925
3: Copper 564 0.830 0.8300 0.83000 0.830000
答案 1 :(得分:0)
使用dplyr
的解决方案。 dt2
是最终输出。
dt <- read.table(text = "Date Model Color Value Samples
6/19/17 Gold Blue 0.50 500
6/19/17 Gold Red 1.25 449
6/19/17 Silver Blue 0.75 1320
6/19/17 Silver Blue 1.50 103
6/19/17 Gold Red 0.70 891
6/19/17 Gold Blue 0.41 18103
6/19/17 Copper Blue 0.83 564",
header = TRUE, stringsAsFactors = FALSE)
library(dplyr)
dt2 <- dt %>%
group_by(Model) %>%
summarise(Samples = sum(Samples),
`50th percentile` = quantile(Value, 0.5),
`99th percentile` = quantile(Value, 0.99),
`99.9th percentile` = quantile(Value, 0.999),
`99.99th percentile` = quantile(Value, 0.9999))