我的目标是:给定二分类响应的数据帧(例如,0和1),如何生成以下摘要矩阵:1)有两列(一列用于正确回答第一个问题,另一列用于回答它)不正确),2)有与获得特定总和得分的个人数量相关的行。
例如,假设我有50个受访者,还有5个问题。这意味着有6个响应模式(所有不正确/ 0,然后一个,两个,三个和四个正确,最后所有正确/ 1)。我希望生成的矩阵对象看起来像:
... INCORRECT ..... CORRECT <-- pertaining to a 0 or 1 on the first item respectively
[1]... 10 ............ 0 <-- indicating people who, after responded 0 on the first question, responded 0 on all questions (5 zeroes)
[2]... 8 ............ 2 <-- indicating 12 people who got 1 correct (8 got the first question incorrect, 2 got the first question correct)
[3]... 4 ............. 8 <-- indicating 12 people who got 2 correct (4 got the first question incorrect but got 2 of the other questions correct, 8 got the first question and 1 other correct)
[4]... 6 ............. 3 <-- indicating 9 people who got 3 correct
[5]... 3 ............. 4 <-- indicating 7 people who got 4 correct
[6]... 0 ............. 8 <-- pertaining to the 8 people who answered all 5 questions correctly (necessarily indicating they got the first question correct).
我的思路是,我需要在第一个问题上按性能分割数据帧(一次一列地工作)并找到每一行(参与者)的总分,然后将它们制成第一列;然后为第二个做同样的事情?
这将被构建到一个包中,所以我试图找出如何仅使用基本函数来做到这一点。
这是一个类似于我将要使用的示例数据集:
n <- 50
z <- c(0, 1)
samp.fun <- function(x, n){
sample(x, n, replace = TRUE)
}
data <- data.frame(0)
for (i in 1:5){
data[1:n, i] <- samp.fun(z, n)
}
names(data)[1:5] <- c("x1", "x2", "x3", "x4", "x5")
任何想法都会非常感激!
答案 0 :(得分:4)
使用@ alexwhan的数据,这是一个data.table
解决方案:
require(data.table)
dt <- data.table(data)
dt[, list(x1.incorrect=sum(x1==0), x1.correct=sum(x1==1)), keyby=total]
# total x1.incorrect x1.correct
# 1: 0 2 0
# 2: 1 7 1
# 3: 2 9 8
# 4: 3 7 6
# 5: 4 0 7
# 6: 5 0 3
等效地,您可以更直接地获得结果,如果您不介意稍后使用table
as.list
设置列名,如下所示:
dt[, as.list(table(factor(x1, levels=c(0,1)))), keyby=total]
# total 0 1
# 1: 0 2 0
# 2: 1 7 1
# 3: 2 9 8
# 4: 3 7 6
# 5: 4 0 7
# 6: 5 0 3
注意:您可以使用as.list(.)
包裹setNames()
,如:
dt[, setNames(as.list(table(factor(x1, levels=c(0,1)))),
c("x1.incorrect", "x1.correct")), keyby = total]
也可以一次设置列名。
答案 1 :(得分:3)
因为您在创建数据时没有使用set.seed
,所以我无法针对您的示例检查此解决方案,但我认为这就是您所追求的。我正在使用reshape2
和plyr
中的函数来获取数据摘要。
library(reshape2)
library(plyr)
#create data
set.seed(1234)
n <- 50
z <- c(0, 1)
samp.fun <- function(x, n){
sample(x, n, replace = TRUE)
}
data <- data.frame(0)
for (i in 1:5){
data[1:n, i] <- samp.fun(z, n)
}
names(data)[1:5] <- c("x1", "x2", "x3", "x4", "x5")
data$id <- 1:50
#First get the long form to make summaries on
data.m <- melt(data, id.vars="id")
#Get summary to find total correct answers
data.sum <- ddply(data.m, .(id), summarise,
total = sum(value))
#merge back with original data to associate with id
data <- merge(data, data.sum)
data$total <- factor(data$total)
#summarise again to get difference between patterns
data.sum2 <- ddply(data, .(total), summarise,
x1.incorrect = length(total) - sum(x1),
x1.correct = sum(x1))
data.sum2
# total x1.incorrect x1.correct
# 1 0 2 0
# 2 1 7 1
# 3 2 9 8
# 4 3 7 6
# 5 4 0 7
# 6 5 0 3
答案 2 :(得分:-1)
好的谜题 - 如果我做得对,这也应该这样做:
table(rowSums(data),data[,1])