我有以下数据框,这个问题与[此主题]
有关df = data.frame(c("2012","2012","2012","2013"),
c("AAA","BBB","AAA","AAA"),
c("X","Not-serviced","X","Y"),
c("2","10","3","2.5"))
colnames(df) = c("year","type","service_type","waiting_time")
我想获得服务和非服务组的平均等待时间。这就是数据分组的方式:
library(data.table)
setDT(df)[, .(num_serviced = sum(service_type != "Not-serviced"),
num_notserviced = sum(service_type =="Not_serviced"),
avg_wt = mean(waiting_time)), ## THE PROBLEM HERE!!!
.(year, type)][, Total := num_serviced + num_notserviced][]
但avg_wt = mean(waiting_time))
估计平均等待时间超过总计。我宁愿需要avg_wt_serviced
和avg_wt_notserviced
。
结果必须是:
year type num_serviced num_notserviced num_total avg_wt_serviced avg_wt_notserviced
2012 AAA 2 0 2 2.5 0
答案 0 :(得分:2)
使用dplyr
,我们可以使用mean
library(dplyr)
df %>%
group_by(year,type) %>%
summarise(num_serviced = sum(service_type != "Not-serviced"),
num_notserviced = sum(service_type == "Not-serviced"),
num_total = num_serviced + num_notserviced,
avg_wt_serv = mean(waiting_time[service_type != "Not-serviced"]),
avg_wt_notser = mean(waiting_time[service_type == "Not-serviced"]))
# year type num_serviced num_notserviced num_total avg_wt_serv avg_wt_notser
# <fctr> <fctr> <int> <int> <int> <dbl> <dbl>
#1 2012 AAA 2 0 2 2.5 NaN
#2 2012 BBB 0 1 1 NaN 10
#3 2013 AAA 1 0 1 2.5 NaN
答案 1 :(得分:2)
这里是:
在您的数据框中,等待时间必须是能够使用mean
的数字,请参阅as.numeric()
进行转换。
df = data.frame(c("2012","2012","2012","2013"),
c("AAA","BBB","AAA","AAA"),
c("X","Not-serviced","X","Y"),
c(2,10,3,2.5))
colnames(df) = c("year","type","service_type","waiting_time")
library(data.table)
setDT(df)[, .(num_serviced = sum(service_type != "Not-serviced"),
num_notserviced = sum(service_type =="Not-serviced"),
avg_wt_serviced = ifelse(service_type != "Not-serviced",mean(waiting_time),0),
avg_wt_notserviced = ifelse(service_type == "Not-serviced",mean(waiting_time),0)),
.(year, type)][, Total := num_serviced + num_notserviced][]
答案 2 :(得分:0)
问题似乎在于引用的列。
编辑/添加:由于引号,colummn被读作因子变量。见class(df$waiting_time)
在计算之前添加此行为我提供了正确的答案。
df$waiting_time<- as.numeric(as.character(df$waiting_time))