Question

问题的权利。

我有一个data.frame（结构如下），主要是分类变量（大多数二进制，即是或否，一个有三个级别（data.frame $ tertile）。

'data.frame'::

 $ smoker        : Factor w/ 2 levels "Yes","No"

 $ mi            : Factor w/ 2 levels "Yes","No":

$ angina        : Factor w/ 2 levels "Yes","No": 

 $ pvd           : Factor w/ 2 levels "Yes","No": 

 $ isch.stroke   : Factor w/ 2 levels "Yes","No": 

 $ ht.1          : Factor w/ 2 levels "Yes","No": 

 $ tertile       : Factor w/ 3 levels "1","2","3":

我想生成一个数据框，其中包含所有分类变量的汇总统计数据，即按data.frame$tertile分组的患者比例。

是否可以对分类变量使用ddply，我已经设法使用ddply作为连续变量

x <- ddply(data.frame,.(tertile), numcolwise(mean,))

但发现难以应用catcolwise函数并同时使用ddply。

先谢谢你们，并对任何回复表示感谢。

此致

Anoop

Answer 1

知道＆＃34;是＆＃34;的比例和＆＃34;不＆＃34;计算逻辑评估给出的时间TRUE（TRUE = 1，FALSE = 0）

nYes <- function(x) 100*(sum(x=="Yes")/length(x)

制作一些虚拟数据

vec <- c("Yes","No")
vec2 <- c(1,2,3)
tmp <- data.frame("smoker" = sample(vec,10, replace=TRUE),
             "mi" = sample(vec,10, replace=TRUE),
             "tertile" = sample(vec2,10, replace=TRUE))

然后使用ddply

ddply(tmp, .(tertile), colwise(nYes))

Answer 2

你可以尝试：

 fun1 <- function(x) round(100*(table(x)/length(x))[1],2)
 ddply(dat, .(tertile),colwise(fun1) )

数据

dat <- structure(list(smoker = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 
1L, 2L, 2L), .Label = c("Yes", "No"), class = "factor"), mi = structure(c(1L, 
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("Yes", "No"), class = "factor"), 
angina = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L), .Label = c("Yes", "No"), class = "factor"), pvd = structure(c(2L, 
2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L), .Label = c("Yes", "No"
), class = "factor"), isch.stroke = structure(c(1L, 1L, 1L, 
2L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("Yes", "No"), class = "factor"), 
ht.1 = structure(c(1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L
), .Label = c("Yes", "No"), class = "factor"), tertile = structure(c(3L, 
3L, 3L, 2L, 3L, 1L, 1L, 3L, 3L, 1L), .Label = c("1", "2", 
"3"), class = "factor")), .Names = c("smoker", "mi", "angina", 
"pvd", "isch.stroke", "ht.1", "tertile"), row.names = c(NA, -10L
), class = "data.frame")


  ddply(dat, .(tertile),colwise(fun1) )
#  tertile smoker     mi angina   pvd isch.stroke ht.1
#1       1   0.00   0.00  33.33 33.33        0.00    0
#2       2   0.00 100.00   0.00  0.00        0.00    0
#3       3  33.33  66.67  50.00 50.00       66.67  100

或使用dplyr

 library(dplyr)
  dat%>%
  group_by(tertile)%>% 
  summarise_each(funs(fun1))
  #Source: local data frame [3 x 7]

 #   tertile smoker     mi angina   pvd isch.stroke ht.1
 #1       1   0.00   0.00  33.33 33.33        0.00    0
 #2       2   0.00 100.00   0.00  0.00        0.00    0
 #3       3  33.33  66.67  50.00 50.00       66.67  100

R：在具有分类变量的数据帧上使用ddply

2 个答案:

数据