我正在编写一个简单的程序,它应该将一个.tsv文件解析为多个.csv文件。问题是它花了这么长的时间(我认为在5万行的9分钟是可怕的表现)。请有人看看我的代码并告诉我我做错了什么?
我的表格包含name of participant
,name of media
,timestamp
和一些坐标数据。在我的数据中,可以有一个或多个参与者,每个参与者使用2个媒体文件。我想为每个media files
具体参与者创建csv文件。
例如,我有2位参与者P1
和P2
,每位参与者都使用媒体文件M1
和M2
。所以我想创建P1_M1.csv
,P1_M2.csv
,P2_M1.csv
,P2_M2.csv
。
数据如下所示:
P1 | M1 | data...
P1 | M1 | data...
...
P1 | M2 | data...
...
P2 | m1 | data...
...
...
这是我的代码:
data = read.table("./data.tsv", header = T, sep = "\t", stringsAsFactors = F) # load data from tsv
# function for creating csv file
writeData = function(filename, d){
filename = paste("./", filename, ".csv", sep = "")
write.csv(d, file = filename, row.names = F)
}
# initialize auxiliary variables
participantName = ""
mediaName = ""
# initialize empty dataframe
subdata <- data.frame(TimeStamp = numeric(), GazeLeftX = integer(), GazeLeftY = integer(), GazeRightX = integer(), GazeRightY = integer())
# for each row in original data...
for(r in 1:nrow(data))
{
# check if last participant is same as participant on actual row
if(participantName != data[r, 'ParticipantName']){
# check if last participant is not empty (like no participant was processed yet)
if(participantName != ""){
# if it is not than participant and also his work on media file ended so write data to csv
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
# empty auxiliary dataframe and also mediaName
subdata = subdata[0,]
mediaName = ""
}
# we detected new participant so record it into last participant variable
participantName = data[r, 'ParticipantName']
}
# do same checks for media file because there can also change only mediafile and participant can be the same
if(mediaName != data[r, 'MediaName']){
if(mediaName != ""){
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
subdata = subdata[0,]
}
mediaName = data[r, 'MediaName']
}
# in every iteration append actual row into auxilliary dataframe
subdata = rbind(subdata,
TimeStamp = data.frame(data[r, 'EyeTrackerTimestamp'],
GazeLeftX = data[r, 'GazeLeftX'],
GazeLeftY = data[r, 'GazeLeftY'],
GazeRightX = data[r, 'GazeRightX'],
GazeRightY = data[r, 'GazeRightY']))
}
# if there are any data left in auxiliary dataframe, save it to csv
if(nrow(subdata) != 0){
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
}
答案 0 :(得分:1)
您正在寻找?split
。试试例如:
split(data,data[,c("ParticipantName","MediaName")],drop=TRUE)
将为每个list
- data.frame
对创建包含ParticipantName
的{{1}}。如果要将每个数据帧写在不同的文件上,可以尝试以下方法:
MediaName
其中res<-split(data,data[,c("ParticipantName","MediaName")],drop=TRUE)
Map(writeData,names(res),res)
是您定义的函数。