摘要统计以三个单独的列为条件

时间:2014-03-16 11:32:15

标签: r reshape reshape2

我有一个数据集,包含n个观察结果和三个验证列,表明它们是否可以包含在分析中。我想通过对每个变量求和来总结每个过滤列的数据集。

我在使用reshape包时遇到很大困难,样本数据集如下:

dat <- data.frame (
  ID = c(1:20),
  Var1 = ifelse(runif(20, min = 0, max = 1) > 0.5,1,0),
  Var2 = ifelse(runif(20, min = 0, max = 1) > 0.5,1,0),
  Var3 = ifelse(runif(20, min = 0, max = 1) > 0.5,1,0),
  Filter1 = ifelse(runif(20, min = 0, max = 1) > 0.5,TRUE,FALSE),
  Filter2 = ifelse(runif(20, min = 0, max = 1) > 0.4,TRUE,FALSE),
  Filter3 = ifelse(runif(20, min = 0, max = 1) > 0.3,TRUE,FALSE)
)

返回以下数据集:

   ID Var1 Var2 Var3 Filter1 Filter2   Filter3
1   1    1    0    1   FALSE    TRUE      TRUE
2   2    1    1    1   FALSE   FALSE     FALSE
3   3    1    1    1    TRUE   FALSE      TRUE
4   4    1    0    0    TRUE    TRUE      TRUE
5   5    1    0    0   FALSE   FALSE      TRUE
6   6    1    1    1   FALSE    TRUE     FALSE
7   7    1    0    1   FALSE    TRUE     FALSE
8   8    0    1    1   FALSE    TRUE      TRUE
9   9    0    0    0   FALSE   FALSE     FALSE
10 10    1    0    1   FALSE    TRUE      TRUE
11 11    1    0    0    TRUE    TRUE     FALSE
12 12    0    1    1   FALSE   FALSE      TRUE
13 13    0    0    0    TRUE    TRUE      TRUE
14 14    0    1    1   FALSE    TRUE     FALSE
15 15    0    0    0   FALSE   FALSE     FALSE
16 16    1    1    0    TRUE   FALSE      TRUE
17 17    0    1    0    TRUE   FALSE     FALSE
18 18    1    1    0   FALSE   FALSE      TRUE
19 19    1    0    0   FALSE   FALSE      TRUE
20 20    0    1    0    TRUE    TRUE      TRUE

对于每个过滤器,我想总结每个变量的总和,如下所示:

  Filter      Variable True False
1 Filter1     Var1     2    1
2             Var2     3    0
3             Var3     1    1
4 Filter2     Var1     1    2
5             Var2     2    1
6             Var3     1    1
7 Filter3     Var1     1    2
8             Var2     1    2
9             Var3     0    2

感谢您的帮助,非常感谢。

2 个答案:

答案 0 :(得分:4)

使用dplyrtidyr可以更简洁:

library(dplyr)
library(tidyr)

dat <- data.frame (
  ID = c(1:20),
  Var1 = sample(0:1,20,replace = TRUE),
  Var2 = sample(0:1,20,replace = TRUE),
  Var3 = sample(0:1,20,replace = TRUE),
  Filter1 = sample(0:1,20,replace = TRUE) %>% as.logical,
  Filter2 = sample(0:1,20,replace = TRUE,prob = c(0.6,0.4)) %>% as.logical,
  Filter3 = sample(0:1,20,replace = TRUE,prob = c(0.7,0.3)) %>% as.logical
  )

dat %>%
  gather(Filter, FilterTF, Filter1:Filter3) %>%
  gather(Variable, Value, Var1:Var3) %>%
  group_by(Filter, FilterTF, Variable) %>%
  summarize(Sum = sum(Value)) %>%
  spread(FilterTF, Sum, fill = 0)

## Source: local data frame [9 x 4]
## 
##    Filter Variable FALSE TRUE
## 1 Filter1     Var1     5    8
## 2 Filter1     Var2     4    8
## 3 Filter1     Var3     4    5
## 4 Filter2     Var1     5    8
## 5 Filter2     Var2     5    7
## 6 Filter2     Var3     5    4
## 7 Filter3     Var1     8    5
## 8 Filter3     Var2     8    4
## 9 Filter3     Var3     6    3

答案 1 :(得分:0)

不知怎的,我设法通过一些额外的研究来回答我自己的问题;-) Doh。

为每个过滤器列创建一个新行:

require(reshape2)
require(plyr)
Long <- melt(dat, id=c(1:4))
Long <- rename(Long,c("variable"="Filter","value"="FilterTF"))

然后我重新熔化数据以包含TRUE / FALSE组件:

Longer <- melt(Long,id=c("ID","Filter","FilterTF"))

ddply负责汇总统计数据:

Stats <- ddply(Longer,.(Filter,FilterTF,variable), summarise, 
                   Sum = sum(value))

现在我使用reshape2来获取所需的格式:

dcast(Stats, Filter+variable~FilterTF, value=Sum)


   Filter variable FALSE TRUE
1 Filter1     Var1     2    1
2 Filter1     Var2     3    0
3 Filter1     Var3     1    1
4 Filter2     Var1     1    2
5 Filter2     Var2     2    1
6 Filter2     Var3     1    1
7 Filter3     Var1     1    2
8 Filter3     Var2     1    2
9 Filter3     Var3     0    2