我在r
中有以下数据框 Service Container_Pick_Day
ABC 0
ABC 1
ABC 1
ABC 2
ABC NA
ABC 0
ABC 1
DEF NA
DEF 0
DEF 1
DEF 1
DEF 1
DEF 2
DEF 1
列Container_Pick_Day
是数字,由NA
值组成。
我想要做的是计算Service
在0th day,after 1 day,2 day and so on
上忽略NA
值而选择的容器的明智百分比
所需的数据框将是
Service Container_Pick_Day Percentage
ABC 0 (2/6)*100 = 33.33
ABC 1 (3/6)*100 = 50
ABC 2 (1/6)*100 = 16.67
DEF 0 (1/6)*100 = 16.67
DEF 1 (3/6)*100 = 50
DEF 2 (1/6)*100 = 16.67
我在R中跟随,但它在输出中生成NA值
df%>%
group_by(Service) %>%
summarise(pick_day_perc = n()/sum(Container_Pick_Day),na.rm=T) %>%
as.data.frame()
我是否必须按Service and Container_Pick_Day
分组?
答案 0 :(得分:5)
根据@nicola,@ karun和我自己提供的上述所有评论添加答案
library(dplyr)
#nicola
df %>%
filter(!is.na(Container_Pick_Day)) %>%
group_by(Service,Container_Pick_Day) %>%
summarise(Percentage=n()) %>%
group_by(Service) %>%
mutate(Percentage=Percentage/sum(Percentage)*100)
#akrun
df %>%
filter(complete.cases(Container_Pick_Day)) %>%
count(Service, Container_Pick_Day) %>%
group_by(Service) %>%
transmute(Container_Pick_Day, Percentage=n/sum(n)*100)
#Sotos
df %>%
na.omit() %>%
group_by_all() %>%
summarise(ptg = n()) %>%
group_by(Service) %>%
mutate(ptg = prop.table(ptg)*100)
所有结果,
Service Container_Pick_Day Percentage <fctr> <int> <dbl> 1 ABC 0 33.33333 2 ABC 1 50.00000 3 ABC 2 16.66667 4 DEF 0 16.66667 5 DEF 1 66.66667 6 DEF 2 16.66667