问题的权利。
我有一个data.frame(结构如下),主要是分类变量(大多数二进制,即是或否,一个有三个级别(data.frame $ tertile)。
'data.frame'::
$ smoker : Factor w/ 2 levels "Yes","No"
$ mi : Factor w/ 2 levels "Yes","No":
$ angina : Factor w/ 2 levels "Yes","No":
$ pvd : Factor w/ 2 levels "Yes","No":
$ isch.stroke : Factor w/ 2 levels "Yes","No":
$ ht.1 : Factor w/ 2 levels "Yes","No":
$ tertile : Factor w/ 3 levels "1","2","3":
我想生成一个数据框,其中包含所有分类变量的汇总统计数据,即按data.frame$tertile
分组的患者比例。
是否可以对分类变量使用ddply,我已经设法使用ddply作为连续变量
x <- ddply(data.frame,.(tertile), numcolwise(mean,))
但发现难以应用catcolwise函数并同时使用ddply。
先谢谢你们,并对任何回复表示感谢。
此致
Anoop
答案 0 :(得分:0)
知道&#34;是&#34;的比例和&#34;不&#34;计算逻辑评估给出的时间TRUE
(TRUE = 1,FALSE = 0)
nYes <- function(x) 100*(sum(x=="Yes")/length(x)
制作一些虚拟数据
vec <- c("Yes","No")
vec2 <- c(1,2,3)
tmp <- data.frame("smoker" = sample(vec,10, replace=TRUE),
"mi" = sample(vec,10, replace=TRUE),
"tertile" = sample(vec2,10, replace=TRUE))
然后使用ddply
ddply(tmp, .(tertile), colwise(nYes))
答案 1 :(得分:0)
你可以尝试:
fun1 <- function(x) round(100*(table(x)/length(x))[1],2)
ddply(dat, .(tertile),colwise(fun1) )
dat <- structure(list(smoker = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L,
1L, 2L, 2L), .Label = c("Yes", "No"), class = "factor"), mi = structure(c(1L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("Yes", "No"), class = "factor"),
angina = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("Yes", "No"), class = "factor"), pvd = structure(c(2L,
2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L), .Label = c("Yes", "No"
), class = "factor"), isch.stroke = structure(c(1L, 1L, 1L,
2L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("Yes", "No"), class = "factor"),
ht.1 = structure(c(1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L
), .Label = c("Yes", "No"), class = "factor"), tertile = structure(c(3L,
3L, 3L, 2L, 3L, 1L, 1L, 3L, 3L, 1L), .Label = c("1", "2",
"3"), class = "factor")), .Names = c("smoker", "mi", "angina",
"pvd", "isch.stroke", "ht.1", "tertile"), row.names = c(NA, -10L
), class = "data.frame")
ddply(dat, .(tertile),colwise(fun1) )
# tertile smoker mi angina pvd isch.stroke ht.1
#1 1 0.00 0.00 33.33 33.33 0.00 0
#2 2 0.00 100.00 0.00 0.00 0.00 0
#3 3 33.33 66.67 50.00 50.00 66.67 100
或使用dplyr
library(dplyr)
dat%>%
group_by(tertile)%>%
summarise_each(funs(fun1))
#Source: local data frame [3 x 7]
# tertile smoker mi angina pvd isch.stroke ht.1
#1 1 0.00 0.00 33.33 33.33 0.00 0
#2 2 0.00 100.00 0.00 0.00 0.00 0
#3 3 33.33 66.67 50.00 50.00 66.67 100