使用tapply和sapply,我试图基于我使用sapply给tapply的多个(两个)索引求和计数的总数。问题是返回的矩阵丢失了我给tapply提供的列名。我最终使用melt()将矩阵转换为data.frame,以便将其输入到ggplot中,并且不得不以更手动的方式添加变量名称,但是我希望通过两个apply()函数保留它们。当我仅在tapply()中的索引上使用时,度量标准/变量名称会保留,因此我很想知道为什么它们会因两个索引而丢失。
Fc_desc. <- rep(c(rep("Local",10),rep("Collector",10),rep("Arterial",10)),2)
Year. <- c(rep(seq(2000,2008,2),12))
df.. <- data.frame(Fc_desc = Fc_desc., Year = Year., Tot_ped_fatal_cnt = sample(length(Year.)),Tot_ped_inj_lvl_a_cnt = sample(length(Year.)))
#Define metrics(columns) of interest
Metrics. <- c("Tot_ped_fatal_cnt", "Tot_ped_inj_lvl_a_cnt")
#Summarize into long data frame
Ped_FcSv.. <- melt(sapply(Metrics., function(x){tapply(df..[,x],list(df..$Year, df..$Fc_desc), sum,na.rm=T)}),varnames = c("Fc_desc","Year","Injury_Severity"), value.name = "Count")
答案 0 :(得分:0)
我最初的解决方法是使用循环和列表”
Metrics. <- c("Tot_ped_fatal_cnt", "Tot_ped_inj_lvl_a_cnt")
TempList_ <- list()
for(metric in Metrics.){
TempList_[[metric]] <- tapply(df..[,metric],list(df..$Year, df..$Fc_desc),
sum)
}
TempList_YrSv <- melt(TempList_, varnames = c("Year","Fc_desc"), value.name =
"Count")
colnames(TempList_YrSv )[3] <- "Injury_Severity"
这占用了6行,并且在我的717,000行实际数据上花费了0.46秒的时间
我修改并应用了Aosmith解决方案:
Cols. <- c(Metrics., "Year","Fc_desc")
#Transpose data to long form
df_long <- melt(df..[,Cols.], measure.vars = Metrics., variable.name = c("Injury_Severity"), value.name = "Count")
#Apply aggregate() to sum Count on 3 indices
Ped_YrSv.. <- aggregate(Count ~ Fc_desc + Year + Injury_Severity, data = df_long, FUN = sum,na.rm=T)
此解决方案需要3.9秒,但仅需3行。我意识到要分开头发,但是我正努力变得更优雅,并且远离列表和循环,因此这很有用。我想我对此可以满意。谢谢大家。