我有一个数据集,包含n个观察结果和三个验证列,表明它们是否可以包含在分析中。我想通过对每个变量求和来总结每个过滤列的数据集。
我在使用reshape包时遇到很大困难,样本数据集如下:
dat <- data.frame (
ID = c(1:20),
Var1 = ifelse(runif(20, min = 0, max = 1) > 0.5,1,0),
Var2 = ifelse(runif(20, min = 0, max = 1) > 0.5,1,0),
Var3 = ifelse(runif(20, min = 0, max = 1) > 0.5,1,0),
Filter1 = ifelse(runif(20, min = 0, max = 1) > 0.5,TRUE,FALSE),
Filter2 = ifelse(runif(20, min = 0, max = 1) > 0.4,TRUE,FALSE),
Filter3 = ifelse(runif(20, min = 0, max = 1) > 0.3,TRUE,FALSE)
)
返回以下数据集:
ID Var1 Var2 Var3 Filter1 Filter2 Filter3
1 1 1 0 1 FALSE TRUE TRUE
2 2 1 1 1 FALSE FALSE FALSE
3 3 1 1 1 TRUE FALSE TRUE
4 4 1 0 0 TRUE TRUE TRUE
5 5 1 0 0 FALSE FALSE TRUE
6 6 1 1 1 FALSE TRUE FALSE
7 7 1 0 1 FALSE TRUE FALSE
8 8 0 1 1 FALSE TRUE TRUE
9 9 0 0 0 FALSE FALSE FALSE
10 10 1 0 1 FALSE TRUE TRUE
11 11 1 0 0 TRUE TRUE FALSE
12 12 0 1 1 FALSE FALSE TRUE
13 13 0 0 0 TRUE TRUE TRUE
14 14 0 1 1 FALSE TRUE FALSE
15 15 0 0 0 FALSE FALSE FALSE
16 16 1 1 0 TRUE FALSE TRUE
17 17 0 1 0 TRUE FALSE FALSE
18 18 1 1 0 FALSE FALSE TRUE
19 19 1 0 0 FALSE FALSE TRUE
20 20 0 1 0 TRUE TRUE TRUE
对于每个过滤器,我想总结每个变量的总和,如下所示:
Filter Variable True False
1 Filter1 Var1 2 1
2 Var2 3 0
3 Var3 1 1
4 Filter2 Var1 1 2
5 Var2 2 1
6 Var3 1 1
7 Filter3 Var1 1 2
8 Var2 1 2
9 Var3 0 2
感谢您的帮助,非常感谢。
答案 0 :(得分:4)
使用dplyr
和tidyr
可以更简洁:
library(dplyr)
library(tidyr)
dat <- data.frame (
ID = c(1:20),
Var1 = sample(0:1,20,replace = TRUE),
Var2 = sample(0:1,20,replace = TRUE),
Var3 = sample(0:1,20,replace = TRUE),
Filter1 = sample(0:1,20,replace = TRUE) %>% as.logical,
Filter2 = sample(0:1,20,replace = TRUE,prob = c(0.6,0.4)) %>% as.logical,
Filter3 = sample(0:1,20,replace = TRUE,prob = c(0.7,0.3)) %>% as.logical
)
dat %>%
gather(Filter, FilterTF, Filter1:Filter3) %>%
gather(Variable, Value, Var1:Var3) %>%
group_by(Filter, FilterTF, Variable) %>%
summarize(Sum = sum(Value)) %>%
spread(FilterTF, Sum, fill = 0)
## Source: local data frame [9 x 4]
##
## Filter Variable FALSE TRUE
## 1 Filter1 Var1 5 8
## 2 Filter1 Var2 4 8
## 3 Filter1 Var3 4 5
## 4 Filter2 Var1 5 8
## 5 Filter2 Var2 5 7
## 6 Filter2 Var3 5 4
## 7 Filter3 Var1 8 5
## 8 Filter3 Var2 8 4
## 9 Filter3 Var3 6 3
答案 1 :(得分:0)
不知怎的,我设法通过一些额外的研究来回答我自己的问题;-) Doh。
为每个过滤器列创建一个新行:
require(reshape2)
require(plyr)
Long <- melt(dat, id=c(1:4))
Long <- rename(Long,c("variable"="Filter","value"="FilterTF"))
然后我重新熔化数据以包含TRUE / FALSE组件:
Longer <- melt(Long,id=c("ID","Filter","FilterTF"))
ddply负责汇总统计数据:
Stats <- ddply(Longer,.(Filter,FilterTF,variable), summarise,
Sum = sum(value))
现在我使用reshape2来获取所需的格式:
dcast(Stats, Filter+variable~FilterTF, value=Sum)
Filter variable FALSE TRUE
1 Filter1 Var1 2 1
2 Filter1 Var2 3 0
3 Filter1 Var3 1 1
4 Filter2 Var1 1 2
5 Filter2 Var2 2 1
6 Filter2 Var3 1 1
7 Filter3 Var1 1 2
8 Filter3 Var2 1 2
9 Filter3 Var3 0 2