R:在具有分类变量的数据帧上使用ddply

时间:2014-08-07 19:45:20

标签: r plyr

问题的权利。

我有一个data.frame(结构如下),主要是分类变量(大多数二进制,即是或否,一个有三个级别(data.frame $ tertile)。

'data.frame'::

 $ smoker        : Factor w/ 2 levels "Yes","No"

 $ mi            : Factor w/ 2 levels "Yes","No":

$ angina        : Factor w/ 2 levels "Yes","No": 

 $ pvd           : Factor w/ 2 levels "Yes","No": 

 $ isch.stroke   : Factor w/ 2 levels "Yes","No": 

 $ ht.1          : Factor w/ 2 levels "Yes","No": 

 $ tertile       : Factor w/ 3 levels "1","2","3": 

我想生成一个数据框,其中包含所有分类变量的汇总统计数据,即按data.frame$tertile分组的患者比例。

是否可以对分类变量使用ddply,我已经设法使用ddply作为连续变量

x <- ddply(data.frame,.(tertile), numcolwise(mean,))

但发现难以应用catcolwise函数并同时使用ddply。

先谢谢你们,并对任何回复表示感谢。

此致

Anoop

2 个答案:

答案 0 :(得分:0)

知道&#34;是&#34;的比例和&#34;不&#34;计算逻辑评估给出的时间TRUE(TRUE = 1,FALSE = 0)

nYes <- function(x) 100*(sum(x=="Yes")/length(x)

制作一些虚拟数据

vec <- c("Yes","No")
vec2 <- c(1,2,3)
tmp <- data.frame("smoker" = sample(vec,10, replace=TRUE),
             "mi" = sample(vec,10, replace=TRUE),
             "tertile" = sample(vec2,10, replace=TRUE))

然后使用ddply

ddply(tmp, .(tertile), colwise(nYes))

答案 1 :(得分:0)

你可以尝试:

 fun1 <- function(x) round(100*(table(x)/length(x))[1],2)
 ddply(dat, .(tertile),colwise(fun1) )

数据

dat <- structure(list(smoker = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 
1L, 2L, 2L), .Label = c("Yes", "No"), class = "factor"), mi = structure(c(1L, 
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("Yes", "No"), class = "factor"), 
angina = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L), .Label = c("Yes", "No"), class = "factor"), pvd = structure(c(2L, 
2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L), .Label = c("Yes", "No"
), class = "factor"), isch.stroke = structure(c(1L, 1L, 1L, 
2L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("Yes", "No"), class = "factor"), 
ht.1 = structure(c(1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L
), .Label = c("Yes", "No"), class = "factor"), tertile = structure(c(3L, 
3L, 3L, 2L, 3L, 1L, 1L, 3L, 3L, 1L), .Label = c("1", "2", 
"3"), class = "factor")), .Names = c("smoker", "mi", "angina", 
"pvd", "isch.stroke", "ht.1", "tertile"), row.names = c(NA, -10L
), class = "data.frame")


  ddply(dat, .(tertile),colwise(fun1) )
#  tertile smoker     mi angina   pvd isch.stroke ht.1
#1       1   0.00   0.00  33.33 33.33        0.00    0
#2       2   0.00 100.00   0.00  0.00        0.00    0
#3       3  33.33  66.67  50.00 50.00       66.67  100

或使用dplyr

 library(dplyr)
  dat%>%
  group_by(tertile)%>% 
  summarise_each(funs(fun1))
  #Source: local data frame [3 x 7]

 #   tertile smoker     mi angina   pvd isch.stroke ht.1
 #1       1   0.00   0.00  33.33 33.33        0.00    0
 #2       2   0.00 100.00   0.00  0.00        0.00    0
 #3       3  33.33  66.67  50.00 50.00       66.67  100