R计算正确响应的比例作为两个因子的函数

时间:2013-12-05 19:05:55

标签: r dataframe data-cleansing

我试图计算每个参与者的正确答案的比例,作为三个因素(组,声音和语言)的函数。我的数据框如下所示:

participant group   sound   lang    resp 
advf03      adv     a       in      1
advf03      adv     a       sp      0
advf03      adv     a       in      1
advf03      adv     a       sp      0
advf03      adv     a       in      0
advf03      adv     a       sp      1
advf03      adv     a       sp      0
advf03      adv     a       in      1
advf03      adv     a       in      0
advf03      adv     a       in      1
begf03      beg     a       in      1
begf03      beg     a       in      1
begf03      beg     a       sp      0

“组”有3个级别:adv,int和beg。 “声音”有3个级别:a,e,i。 “郎”有两个级别:in,sp。 “1”表示正确的响应,“0”表示不正确的响应。我希望每个参与者的“1”的比例(即百分比正确)作为新数据框中的新列。我想要的信息类型的一个例子:参与者advf03对“sp”中的“a”有53%的正确率。

以下是我的数据中的50个观察结果:

structure(list(sound = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("a", 
"e", "i"), class = "factor"), resp = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), participant = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("2advf03", "2advf05", "2advm04", "2advm06", "2begf01", 
"2begf02", "2begf04", "2begf05", "2begm03", "2advf01", "2intf01", 
"2intf03", "2intf04", "2intf06", "2advm05"), class = "factor"), 
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("adv", 
"beg", "int"), class = "factor"), lang = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("in", "sp"), class = "factor")), .Names = c("sound", 
"resp", "participant", "group", "lang"), row.names = c(10L, 31L, 
36L, 43L, 47L, 49L, 52L, 59L, 61L, 65L, 66L, 68L, 71L, 79L, 97L, 
99L, 106L, 125L, 133L, 138L, 147L, 149L, 162L, 165L, 174L, 175L, 
33L, 37L, 112L, 136L, 154L, 186L, 11L, 50L, 89L, 92L, 104L, 105L, 
123L, 126L, 129L, 143L, 153L, 173L, 177L, 187L, 188L, 191L, 7L, 
12L), class = "data.frame")

这是我到目前为止所做的:

# get counts of subsets of factors
df <- as.data.frame(table(df))

# new column that gives the proportion of responses
df$prop <- df$Freq / 32

但这似乎没有给我正确的比例。我知道我需要减少数据,以便我没有那么多的观察结果(即每个参与者对每种语言的每种声音都有1个值,但我不知道正确的步骤是这样做的。

1 个答案:

答案 0 :(得分:0)

如果我理解你的问题,你想通过参与者,声音和语言知道1的比例。

因为只有0和1的向量中1的比例只是平均值,所以这应该有效:

aggregate(data=df, resp ~ participant + group + lang, FUN="mean")

50次观察的结果是:

  participant group lang      resp
1     2advf03   adv   in 0.1875000
2     2advf03   adv   sp 0.1111111