我在Python(identify records that make up 90% of total)中发现了类似的问题和响应,但无法完全将其翻译为R。
我正试图找出构成销售至少80%(这是一个变量,因为%可以更改)的最少数量的产品。
例如:
Product Sales
A 100
B 40
C 10
D 15
Total 165
答案应该是,通过识别两个项目,我可以达到132(销售额的80%)。输出应如下所示:
Product Sales
A 100
B 40
我们将不胜感激!
答案 0 :(得分:1)
关于dplyr
解决方案:
编辑:
这里有一个合适的解决方案:
# your threshold
constant <- 0.5
data %>%
# order
arrange(-Sales)%>%
# add the cumulative
mutate(cumulative = round(cumsum(Sales)/sum(Sales),2),
# add a threshold, the difference between the constant and the cumulative
threshold = round(cumsum(Sales)/sum(Sales),2)- constant) %>%
# last, find all above the min value positive under the threshold
filter(threshold <= min(.$threshold[.$threshold > 0]))
# for 0.8
Product Sales cumulative threshold
1 A 100 0.61 -0.19
2 B 40 0.85 0.05
# for 0.5
Product Sales cumulative threshold
1 A 100 0.61 -0.19
有数据:
data <- read.table(text ="Product Sales
A 100
B 40
C 10
D 15", header = T)
答案 1 :(得分:0)
s_t答案既简单又有效,但是如果您正在寻找基本的R解决方案和功能:
example <- data.frame(Product = c("A", "B", "C", "D"), Sales = c(100, 40, 10, 15))
min.products <- function(Product, Sales, percent){
df <- data.frame(Product, Sales)
minimum <- percent*sum(df$Sales)
df <- df[order(-df$Sales), ]
lowest.score <- df$Sales[cumsum(df$Sales)>minimum][1]
answer <- df$Product[df$Sales>=lowest.score]
return(answer)
}
min.products(example$Product, example$Sales, 0.8)