我试图计算每个参与者的正确答案的比例,作为三个因素(组,声音和语言)的函数。我的数据框如下所示:
participant group sound lang resp
advf03 adv a in 1
advf03 adv a sp 0
advf03 adv a in 1
advf03 adv a sp 0
advf03 adv a in 0
advf03 adv a sp 1
advf03 adv a sp 0
advf03 adv a in 1
advf03 adv a in 0
advf03 adv a in 1
begf03 beg a in 1
begf03 beg a in 1
begf03 beg a sp 0
“组”有3个级别:adv,int和beg。 “声音”有3个级别:a,e,i。 “郎”有两个级别:in,sp。 “1”表示正确的响应,“0”表示不正确的响应。我希望每个参与者的“1”的比例(即百分比正确)作为新数据框中的新列。我想要的信息类型的一个例子:参与者advf03对“sp”中的“a”有53%的正确率。
以下是我的数据中的50个观察结果:
structure(list(sound = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("a",
"e", "i"), class = "factor"), resp = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), participant = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("2advf03", "2advf05", "2advm04", "2advm06", "2begf01",
"2begf02", "2begf04", "2begf05", "2begm03", "2advf01", "2intf01",
"2intf03", "2intf04", "2intf06", "2advm05"), class = "factor"),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("adv",
"beg", "int"), class = "factor"), lang = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("in", "sp"), class = "factor")), .Names = c("sound",
"resp", "participant", "group", "lang"), row.names = c(10L, 31L,
36L, 43L, 47L, 49L, 52L, 59L, 61L, 65L, 66L, 68L, 71L, 79L, 97L,
99L, 106L, 125L, 133L, 138L, 147L, 149L, 162L, 165L, 174L, 175L,
33L, 37L, 112L, 136L, 154L, 186L, 11L, 50L, 89L, 92L, 104L, 105L,
123L, 126L, 129L, 143L, 153L, 173L, 177L, 187L, 188L, 191L, 7L,
12L), class = "data.frame")
这是我到目前为止所做的:
# get counts of subsets of factors
df <- as.data.frame(table(df))
# new column that gives the proportion of responses
df$prop <- df$Freq / 32
但这似乎没有给我正确的比例。我知道我需要减少数据,以便我没有那么多的观察结果(即每个参与者对每种语言的每种声音都有1个值,但我不知道正确的步骤是这样做的。
答案 0 :(得分:0)
如果我理解你的问题,你想通过参与者,声音和语言知道1的比例。
因为只有0和1的向量中1的比例只是平均值,所以这应该有效:
aggregate(data=df, resp ~ participant + group + lang, FUN="mean")
50次观察的结果是:
participant group lang resp
1 2advf03 adv in 0.1875000
2 2advf03 adv sp 0.1111111