我有一个dataframe,我希望从中获取数据集中的百分比//其中%treat =已处理/总访问次数
例如。 %治疗急性上颌窦炎= 93470/93470 = 100%
dput(droplevels(head(magma)))
structure(list(DIAG_CODE_1 = structure(c(1L, 1L, 2L, 2L, 2L,
2L), .Label = c("4610 SINUSITIS MAXILLARY ACUT", "4619 SINUSITIS ACUTE UNSP"
), class = "factor"), GENDER = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = "FEMALE", class = "factor"), AGE = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "0-2", class = "factor"), Mention_DRGU = c(5460L,
5460L, 17790L, 17790L, 9400L, 9400L), treatment_status = structure(c(1L,
2L, 1L, 2L, 1L, 2L), .Label = c("Total visits", "Treated"), class = "factor"),
diag_class_1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Acute sinusitis", class = "factor"),
year = c(2007L, 2007L, 2007L, 2007L, 2008L, 2008L)), .Names = c("DIAG_CODE_1",
"GENDER", "AGE", "Mention_DRGU", "treatment_status", "diag_class_1",
"year"), row.names = c(1285L, 1286L, 1407L, 1410L, 1408L, 1411L
), class = "data.frame")
然而,有432行,我可以手动计算所有这些,但这将是非常耗时的。不是计算机的用途:p。如果你们能帮助我找到自动化R中任务的方法,那将非常感激。
R有没有办法创建一个结果数据框,告诉我DIAG_CODE_1,GENDER,AGE,%处理和年份?我已经(在Excel中)创建了我想要output的样子,所以你们可以看到我的意思。
我将对其他呼吸系统疾病做这种计算,所以我现在想要学习,从长远来看,我可以让生活更轻松。
答案 0 :(得分:1)
试试这个:
magma2<-reshape(magma, idvar = c("DIAG_CODE_1","GENDER","AGE","diag_class_1","year"), timevar = "treatment_status", direction = "wide")
colnames(magma2)<-c("DIAG_CODE_1","GENDER","AGE","diag_class_1","year","Treated","TotVisits")
magma2$PercentageTreated<-as.numeric(as.character(magma2$Treated))/as.numeric(as.character(magma2$TotVisits))
head(magma2)
答案 1 :(得分:1)
您可以使用dplyr
library(dplyr)
library(tidyr)
magma %>%
spread(treatment_status, Mention_DRGU) %>%
mutate(PercentageTreated=100*(Treated/`Total visits`)) %>%
select(-diag_class_1, -`Total visits`, -Treated)
# DIAG_CODE_1 GENDER AGE year PercentageTreated
#1 4610 SINUSITIS MAXILLARY ACUT FEMALE 0-2 2007 100
#2 4619 SINUSITIS ACUTE UNSP FEMALE 0-2 2007 100
#3 4619 SINUSITIS ACUTE UNSP FEMALE 0-2 2008 100