Question

我的问题涉及如何干净地处理我拥有的大量数据。

我正在进行一项有4个条件的实验。每个条件下将有20个参与者，当每个完成实验时，我留下了一个包含600行和7列的文本文件。 600行对应于每个人完成的600次试验。 7列是指每次试验测量的变量。

所以每个人的数据都是这样的（除了有600行）：

394        b               a                0          9773              1        1436
114        a               b                0          3595              1        1246
432        b               a                0          1272              1        1061
209        a               a                1          3514              1        120

为了运行我的分析，如果我能将所有这些文本文件放到一个名为“data”的对象中，它将具有以下维度，这将非常有用：

实验条件（1-4）
参与者编号（1-20）
试用编号（1-600）
变量（1-7）

我的文件名称为“ii-wm-long_1316994934_7_1.txt”，其中“ii-wm-long”部分标识其实验条件，最后一个数字（此处为1）标识其参与者编号。

目前我的代码看起来像这样：

#Get the names of the text files in the results folder
files <- list.files()

#Conditions - which ones correspond to which numbers
condition.def <- c("ii-wm-long","ii-wm-short","wm-ii-long","wm-ii-short")
#1 = ii-wm-long
#2 = ii-wm-short
#3 = wm-ii-long
#4 = wm-ii-short

#This is where everything will be stored
data <- array(NA,dim=c(4,20,600,7),dimnames=c("condition","participantNumber","trialNumber","experimentalvariable"))

#Loop for each participant's file
for (n in 1:length(files)){

#What condition is the person in?
condition <- unlist(strsplit(files[n],"\\_"))[1]
condition <- grep(condition,condition.def)

#What is their participant number (of the people in that condition)?
ppt <- as.integer(unlist(strsplit(unlist(strsplit(files[n],"\\_")),"\\."))[4])

#Read the text file into the array
data[condition,ppt,,] <- read.table(files[n],sep="\t",header=F,nrows=600,col.names=c("stimulus","category","category.choice","category.correct","category.time","memory.present","memory.time"))

}

我收到错误：

Error in data[condition, ppt, , ] <- read.table(files[n], sep = "\t",  : 
  incorrect number of subscripts

我已经阅读了cbind和abind，似乎无法弄清楚他们如何允许我逐个读取数据。

采用2D阵列并将其转换为4D阵列的最后2个维度的正确方法是什么？

Answer 1

read.table会返回data.frame，因此至少需要将其包含在as.matrix中：

data[condition,ppt,,] <- 
    as.matrix(read.table(files[n], sep="\t", header=FALSE, nrows=600, 
                         col.names=c("stimulus", "category", "category.choice",
                                     "category.correct", "category.time", 
                                     "memory.present", "memory.time")))

但是，这非常脆弱，因为你直接从I / O进入数组切片，你应该进行一些健全性检查，这样你就知道输入是预期的，文件存在，依此类推。

嵌入二维数组作为R中高维数组的一部分

1 个答案: