Question

我有每个主题需要评分的反应时间和准确度数据，我想知道哪个R包或功能最能满足我的需求。下面是2个主题的数据样本片段。每行代表一个受试者响应刺激的单一试验。

 date subject trialn blockcode     trialtype latency response correct
32913      15      1  practice    taskswitch    1765      205       1
32913      15      2  practice     cueswitch    4372      203       1
32913      15      3  practice cuerepetition    2523      203       0
32913      15      1      test     cueswitch    2239      205       1
32913      15      2      test cuerepetition    1244      203       1
32913      15      3      test    taskswitch    1472      203       0
32913      15      4      test     cueswitch    1877      205       1
32913      15      5      test    taskswitch    2271      203       1
30413      16      1  practice    taskswitch    1377      203       1
30413      16      2  practice    taskswitch    1648      203       1
30413      16      3  practice     cueswitch    1181      205       1
30413      16      1      test     cueswitch    1045      205       1
30413      16      2      test cuerepetition     969      203       0
30413      16      3      test     cueswitch     857      203       1
30413      16      4      test    taskswitch    1038      205       1
30413      16      5      test cuerepetition     836      203       0

以下是我想要做的事情的描述：

仅查看“测试”试验，对每个受试者进行计算
- 试验总数
- 潜伏期（即反应时间）低于300毫秒
- 平均延迟
- 是正确的
然后，仅查看延迟在受试者平均潜伏期3个标准差内的试验，计算每个试验类型的平均潜伏期
最后，创建一个包含所有这些变量以及主题ID和日期的新数据框

Answer 1

plyr包对于这类东西很方便（也是data.table，但我不知道它的语法）。这是一个开始的例子：

my_function <- function(tmp){
  data.frame(n_trials = sum(tmp[ ,'trialn']),
             n_trialslat  = sum(tmp[tmp[,'latency'] <= 300 ,'trialn']),
             mean_latency = mean(tmp[,'latency']))
}
library(plyr)
ddply(subset(d, blockcode == "test"), 'subject', my_function)

Answer 2

Stackoverflow并不适用于教程，因此请务必查看有关data.table的优秀在线资源。 website是一个良好的开端，这里有很多关于包的问题在SO上几乎涵盖任何东西。

在这里，我只想告诉你，如果习惯了包的语法，它会有多容易。

首先，让我们加载包并读入您的数据：

library(data.table)
str <- "date subject trialn blockcode     trialtype latency response correct
        32913      15      1  practice    taskswitch    1765      205       1
        32913      15      2  practice     cueswitch    4372      203       1
        32913      15      3  practice cuerepetition    2523      203       0
        32913      15      1      test     cueswitch    2239      205       1
        32913      15      2      test cuerepetition    1244      203       1
        32913      15      3      test    taskswitch    1472      203       0
        32913      15      4      test     cueswitch    1877      205       1
        32913      15      5      test    taskswitch    2271      203       1
        30413      16      1  practice    taskswitch    1377      203       1
        30413      16      2  practice    taskswitch    1648      203       1
        30413      16      3  practice     cueswitch    1181      205       1
        30413      16      1      test     cueswitch    1045      205       1
        30413      16      2      test cuerepetition     969      203       0
        30413      16      3      test     cueswitch     857      203       1
        30413      16      4      test    taskswitch    1038      205       1
        30413      16      5      test cuerepetition     836      203       0"
DT <- as.data.table(read.table(text=str, header=TRUE))

现在，这是你要求的一件事：

仅查看“测试”试验，对每个独特的主题进行计算试验总数，延迟试验次数（即反应时间）低于300毫秒，平均潜伏期均值正确（即精度）。

DT[blockcode=="test", 
   list(TotalNr = .N,
        NrTrailLat = sum(latency < 300),
        MeanLat = mean(latency),
        MeanCor = mean(correct)), 
   by="subject"]
subject TotalNr NrTrailLat MeanLat MeanCor
1:      15       5          0  1820.6     0.8
2:      16       5          0   949.0     0.6

基本上，只需几行代码，我就可以回答所有这些问题。在我看来，语法也非常简单。对于我们的DT，我们只想查看blockcode=="test"的观察结果。接下来，我们要分别对每个主题运行所有分析。使用by="subject"语句可以轻松完成。很酷的事情：如果你想分几个维度，只需添加它们......相反无视练习，让我们分别看一下：

DT[, 
   list(TotalNr = .N,
        NrTrailLat = sum(latency < 300),
        MeanLat = mean(latency),
        MeanCor = mean(correct)), 
   by="subject,blockcode"]
   subject blockcode TotalNr NrTrailLat  MeanLat   MeanCor
1:      15  practice       3          0 2886.667 0.6666667
2:      15      test       5          0 1820.600 0.8000000
3:      16  practice       3          0 1402.000 1.0000000
4:      16      test       5          0  949.000 0.6000000

现在不要告诉我这不是很棒！

让我们尝试另一个：

此外，创建包含最后（或第一）值的变量 date和subjectID（这是为了将数据和subjectID放在一个新的数据框）。

我不确定你的意思，因为date在每个主题的示例中没有变化。所以让我们更难一点。让我们说吧我们想知道第一次试用的每个subject,blockcode组合的延迟。为此，我们首先应对DT进行排序，以便我们知道这一点第一个trialn始终是1.（对于这个示例数据，这不是必需的，因为它似乎已经排序了。）

setkey(DT, subject, blockcode, trialn)
DT[, list(FirstLat = latency[1]) , by="subject,blockcode"]
subject blockcode FirstLat
1:      15  practice     1765
2:      15      test     2239
3:      16  practice     1377
4:      16      test     1045

但是，您希望将其添加为DT中的新列。为此，您可以使用:=运算符：

DT[, FirstLat := latency[1] , by="subject,blockcode"]  
DT
date subject trialn blockcode     trialtype latency response correct FirstLat
1: 32913      15      1  practice    taskswitch    1765      205       1     1765
2: 32913      15      2  practice     cueswitch    4372      203       1     1765
3: 32913      15      3  practice cuerepetition    2523      203       0     1765
4: 32913      15      1      test     cueswitch    2239      205       1     2239
5: 32913      15      2      test cuerepetition    1244      203       1     2239
6: 32913      15      3      test    taskswitch    1472      203       0     2239
7: 32913      15      4      test     cueswitch    1877      205       1     2239
8: 32913      15      5      test    taskswitch    2271      203       1     2239
9: 30413      16      1  practice    taskswitch    1377      203       1     1377
10: 30413      16      2  practice    taskswitch    1648      203       1     1377
11: 30413      16      3  practice     cueswitch    1181      205       1     1377
12: 30413      16      1      test     cueswitch    1045      205       1     1045
13: 30413      16      2      test cuerepetition     969      203       0     1045
14: 30413      16      3      test     cueswitch     857      203       1     1045
15: 30413      16      4      test    taskswitch    1038      205       1     1045
16: 30413      16      5      test cuerepetition     836      203       0     1045

所以这些只是让你入门的一些想法。我接受了这个努力，因为我想告诉你，当你理解时，大多数事情变得非常容易基础。这应该是通过手册的动机，这可能在开始时有点矫枉过正。但值得努力，相信我！因为我甚至没有提到最好的部分：data.table也非常快。祝你的分析好运。

如何计算数据集中每个主题的变量

2 个答案: