从R中的数据框列表中获取百分比

时间:2019-08-30 22:32:29

标签: r

我是R的新手(有几个月的在线学习和阅读经验),在此之前没有编码经验。

我一直在使用从工作(卫生保健)获得的数据集进行一些练习。我想证明该数据集中随时间推移(按月)的某些患者预后。

我已经按月将数据分成了单独的数据帧,并存储在列表中。然后,我将列表中的每个数据框缩小到我要查看的3种术后结果。这三个结果都是二进制的(Y或N)。

我想知道是否仍然可以按月计算出每种结果的“ Y”百分比,然后将其存储在一个对象中,然后我可以绘制该对象以显示随时间的趋势(通过月)。

我完全错误地解决了这个问题吗?我应该完全不用列表吗?

我设法找到了Y和N的表格列表,但是现在对于从那里开始做什么一无所知。

   list(structure(list(Mobilised_D1 = structure(c(2L, 1L, 1L, 1L, 
2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L), .Label = c("N", "Y"), class = 
"factor"), 
Catheter_rm_D1 = structure(c(2L, 1L, 1L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"), 
Diet_D1 = structure(c(2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("N", "Y"), class = "factor")), class = 
"data.frame", row.names = 2:15), 
structure(list(Mobilised_D1 = structure(c(1L, 2L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("N", 
"Y"), class = "factor"), Catheter_rm_D1 = structure(c(1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L
), .Label = c("N", "Y"), class = "factor"), Diet_D1 = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("N", "Y"), class = "factor")), class = "data.frame", 
row.names = 16:31), 
structure(list(Mobilised_D1 = structure(c(2L, 1L, 1L, 2L, 
1L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"), 
    Catheter_rm_D1 = structure(c(1L, 1L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"), 
    Diet_D1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L), .Label = c("N", "Y"), class = "factor")), class = 
"data.frame", row.names = 32:42), 
structure(list(Mobilised_D1 = structure(c(2L, 1L, 1L, 1L, 
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("N", 
"Y"), class = "factor"), Catheter_rm_D1 = structure(c(2L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L), .Label = c("N", "Y"), class = "factor"), Diet_D1 = 
structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L), .Label = c("N", "Y"), class = "factor")), class = "data.frame", 
row.names = 43:60), 
structure(list(Mobilised_D1 = structure(c(1L, 1L, 1L, 2L, 
2L, 1L, 1L, 1L, NA, 2L, 1L, 1L, 2L, NA), .Label = c("N", 
"Y"), class = "factor"), Catheter_rm_D1 = structure(c(1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("N", 
"Y"), class = "factor"), Diet_D1 = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("N", 
"Y"), class = "factor")), class = "data.frame", row.names = 61:74), 
structure(list(Mobilised_D1 = structure(c(1L, 2L, 2L, 1L, 
1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L), .Label = c("N", 
"Y"), class = "factor"), Catheter_rm_D1 = structure(c(1L, 
1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L
), .Label = c("N", "Y"), class = "factor"), Diet_D1 = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("N", "Y"), class = "factor")), class = "data.frame", 
row.names = 75:90))

1 个答案:

答案 0 :(得分:0)

对于输入列表L的每个组成部分,采用所示的均值将其排列成一个多变量时间序列,每月有一行。然后将其绘制在单个面板上。如果要将每个系列放在单独的面板中,请删除facet=NULL

library(zoo)
library(ggplot2)

series <- zoo( t(sapply(L, function(x) colMeans(x == "Y"))) )
autoplot(series, facet = NULL) + geom_point()

(图后续)

screenshot

替代

另一种方法是从DF创建数据帧L以及按月汇总的month向量,如图所示。这利用了这样一个事实:DF的行名称由月份组成,后跟一个小数点,以及构成每个输入行的原始组件中的行号。

DF <- do.call("rbind", setNames(L, seq_along(L)))
month <- as.integer(rownames(DF))
series <- aggregate(zoo(DF == "Y"), month, mean)

autoplot(series, facet = NULL) + geom_point()