我想通过计算每个变量在一个案例和一个控件中存在或不存在的次数来绘制几个不同变量var1PA,var2PA的分组堆积条形图。
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))
我想计算每个案例中每个案例和每个控件的当前和不存在的百分比,并且无法使用prop表进行,
vars <- c('Var1PA', 'Var2PA')
tt <- data.frame(prop.table(as.table(sapply(df[, vars], table)), 2) * 100)
##above line does not calculate the percentage of present absent individually for cases
##and controls within each var
如果我能够这样做,那么我可以使用ggplot2来绘制:
ggplot(tt, aes(Disease, Freq)) +
geom_bar(aes(fill = Var1), position = "stack", stat="identity") + facet_grid(~vars)
如何获得每个变量的案例(现在和不存在)和控制(现在和不存在)的百分比?谢谢!
答案 0 :(得分:1)
这是最后一个问题的一个相当简单的扩展。在将数据转换为长格式时,我们将*Name,Promocode,Link
BASE Plus + iPhone 7,ASDFNOWEDF,base.de/base-plus
BASE Pro + iPhone 7,JBONEDGASD,base.de/base-pro
BASE Light + iPhone 7,NAFODSFNTE,base.de/base-light
BASE Pur + iPhone 7,NAEWRIONF,base.de/base-pur*
视为Disease
,否则代码完全相同:
SampleID
然后我们可以直接转到依赖library(ggplot2)
library(tidyr)
library(dplyr)
mdf = df %>% select(SampleID, Disease, ends_with("PA")) %>%
gather(key = Var, value = PA, -SampleID, -Disease) %>%
mutate(PA = factor(PA, levels = c("Present", "Absent")))
的情节来计算百分比。这与前一个问题中的情节完全相同,但x轴上有ggplot
,并且添加了刻面。
Disease
如果你想要数据框中的百分比,我们可以通过更多的操作来做到这一点:
ggplot(mdf, aes(Disease)) +
geom_bar(aes(fill = PA), position = "fill") +
scale_y_continuous(labels = scales::percent) +
facet_grid(~Var)
通过该汇总数据框,我们可以更明确地创建与上面相同的图:
df_summ = mdf %>% group_by(Disease, Var) %>%
mutate(n = n()) %>% ## calculate n for Disease and Var groups
group_by(Disease, Var, PA) %>%
summarize(Percent = n() / first(n)) ## calculate the fraction P/A in each group
答案 1 :(得分:0)