Question

我的数据框（df）如下所示：

Date        Model   Color   Value  Samples
6/19/17     Gold    Blue    0.50   500
6/19/17     Gold    Red     1.25   449
6/19/17     Silver  Blue    0.75   1320
6/19/17     Silver  Blue    1.50   103
6/19/17     Gold    Red     0.70   891
6/19/17     Gold    Blue    0.41   18103
6/19/17     Copper  Blue    0.83   564

我可以使用以下内容输出每个Color变量的百分位数据：

df_subset <- subset(df, df$Color == 'Blue')
quantile(df_subset$Value, c(0.50, 0.99, 0.999, 0.9999))

但是，我该怎么做？

a）将Samples列添加到输出

b）在输出中添加多行（即Model变量的每个不同值都有一行）

一个例子如下：

            |  # Samples  |  50th percentile |  99th percentile |  99.9th percentile |  99.99th percentile
Gold
Silver
Copper

提前感谢您的帮助！

Answer 1

library(data.table)

dat <- data.table(Date = "6/19/17",
                  Model = c("Gold", "Gold", "Silver", "Silver", "Gold", "Gold", "Copper"),
                  Color = c("Blue", "Red", "Blue", "Blue", "Red", "Blue", "Blue"),
                  Value = c(0.5, 1.25, .75, 1.5, .7, .41, .83),
                  Samples = c(500, 449, 1320, 103, 891, 18103, 564))

dat[, .(Samples = sum(Samples),
        `50th percentile` = quantile(Value, probs = c(0.5)),
        `99th percentile` = quantile(Value, probs = c(0.99)),
        `99.9th percentile` = quantile(Value, probs = c(0.999)),
        `99.99th percentile` = quantile(Value, probs = c(0.9999))), 
    by = Model]

结果：

    Model Samples 50th percentile 99th percentile 99.9th percentile 99.99th percentile
1:   Gold   19943           0.600          1.2335           1.24835           1.249835
2: Silver    1423           1.125          1.4925           1.49925           1.499925
3: Copper     564           0.830          0.8300           0.83000           0.830000

Answer 2

使用dplyr的解决方案。 dt2是最终输出。

dt <- read.table(text = "Date        Model   Color   Value  Samples
6/19/17     Gold    Blue    0.50   500
                 6/19/17     Gold    Red     1.25   449
                 6/19/17     Silver  Blue    0.75   1320
                 6/19/17     Silver  Blue    1.50   103
                 6/19/17     Gold    Red     0.70   891
                 6/19/17     Gold    Blue    0.41   18103
                 6/19/17     Copper  Blue    0.83   564",
                 header = TRUE, stringsAsFactors = FALSE)

library(dplyr)

dt2 <- dt %>%
  group_by(Model) %>%
  summarise(Samples = sum(Samples),
            `50th percentile` = quantile(Value, 0.5),
            `99th percentile` = quantile(Value, 0.99),
            `99.9th percentile` = quantile(Value, 0.999),
            `99.99th percentile` = quantile(Value, 0.9999))

R中表格形式的百分位数据

2 个答案: