我有一个包含两个变量(站点和购买)的数据框(df)。
我想使用dplyr()按网站和购买对数据进行分组,并获取分组数据的计数和百分比。然而,我也喜欢使用称为ALLSITES的行的元素,表示按购买分组的所有站点的数据,因此我最终得到类似于dfgoal的tibble。
问题是我当前的代码没有给我ALLSITES行。我尝试在dplyr()中添加一个基本R函数,但这并不起作用。
非常感谢任何帮助。
起点(df):
df <- data.frame(site=c("LON","MAD","PAR","MAD","PAR","MAD","PAR","MAD","PAR","LON","MAD","LON","MAD","MAD","MAD"),purchase=c("a1","a2","a1","a1","a1","a1","a1","a1","a1","a2","a1","a2","a1","a2","a1"))
期望的结果:
dfgoal <- data.frame(site=c("LON","LON","MAD","MAD","PAR","ALLSITES","ALLSITES"),purchase=c("a1","a2","a1","a2","a1","a1","a2"),bin=c(1,2,6,2,4,11,4),pin_per=c(33.33333,66.66667,75.00000,25.00000,100.00000,73.33333,26.66666))
当前代码:
library(dplyr)
df %>%
group_by(site, purchase) %>%
summarize(bin = sum(purchase==purchase)) %>%
group_by(site) %>%
mutate(bin_per = (bin/sum(bin)*100))
df %>%
rbind(df, transform(df, site = "ALLSITES") %>%
group_by(site, purchase) %>%
summarize(bin = sum(purchase==purchase)) %>%
group_by(site) %>%
mutate(bin_per = (bin/sum(bin)*100))
答案 0 :(得分:1)
我们可以从第一个输出代码块开始,经过&#39; site&#39;使用已创建的“ALLSITES&#39;和&#39;购买&#39;得到oldArray.map(function(item) { return [item]; });
&#39; bin&#39;然后&#39; bin_per&#39;,然后用sum
行绑定两个数据集
bind_rows