Question

我有两列n行的数据集。一栏是价格，另一栏是数量。现在，我想将数据集重组为100 * 2数据框，以使一列为“数量”，每一行代表总数的1％；价格是价格，价格的值来自原始数据集。如何重组？我需要定义一个功能吗？

我认为这可能是分段功能问题，但我不知道如何解决。

这里是原始数据集的一个示例，尽管实际数据集有更多行。

df <- data.frame(price = c(2,2,rep(3,3),rep(4,4)),
                 quantity = c(rep(1,3),2,3,3,4,5,5))

这是我想要的预期重组数据集。

# Q is an example of every 1% of sum(df$quantity)
expected.df <- data.frame(Q=paste(c(1:100),'%',sep=""),
                          P=c(rep(2,8),rep(3,24),rep(4,68)))

谢谢任何人的帮助！

Answer 1

有几种方法可以做到这一点，这里我将使用dplyr包。

library(dplyr)

df <- data.frame(price = c(2,2,rep(3,3),rep(4,4)),
                 quantity = c(rep(1,3),2,3,3,4,5,5))
df

> df
  price quantity
1     2        1
2     2        1
3     3        1
4     3        2
5     3        3
6     4        3
7     4        4
8     4        5
9     4        5

xx  <- sum(df$quantity)
df1 <- df %>% 
  dplyr::mutate(Q_perc = cumsum(quantity),
                Q_perc = paste((Q_perc/xx)*100,"%")) %>% 
  dplyr::select(-quantity)

> df1
  price Q_perc
1     2    4 %
2     2    8 %
3     3   12 %
4     3   20 %
5     3   32 %
6     4   44 %
7     4   60 %
8     4   80 %
9     4  100 %

如何根据另一列的百分位数查找值

1 个答案: