我目前正在使用eyelink生成的数据。 csv(从asc转换而来)基本上是一个大的顺序列表,即不创建列,因此例如一行将具有'start_trial 1'并且在进入' PreBeep1_1st_Sketchpad'之前,下一行将具有x和y坐标以及以下N行。排,最终' start_trial 2'行。
我想知道是否有人对如何操纵这个'堆积'数据并将其转换为长格式数据?
以下是从csv中提取数据时的样子:
MSG 12892743 start_trial 1 SCNB
12892743 757.0 361.7 5916.0 ... SCNB
MSG 12892744 PreBeep1_1st_Sketchpad SCNB
12892744 756.7 361.7 5920.0 ... SCNB
12892745 756.1 362.2 5924.0 ... SCNB
MSG 12892746 order of frames: SCNB
12892746 755.8 362.3 5928.0 ... SCNB
12892747 756.7 362.3 5927.0 ... SCNB
MSG 12892748 crosshair SCNB
12892748 757.8 361.8 5928.0 ... SCNB
12892749 758.4 361.8 5930.0 ... SCNB
MSG 12892750 sketchpad SCNB
12892750 758.1 361.7 5934.0 ... SCNB
12892751 758.3 361.7 5938.0 ... SCNB
MSG 12892752 sketchpad SCNB
12892752 759.1 361.9 5948.0 ... SCNB
12892753 760.4 362.7 5956.0 ... SCNB
MSG 12892754 sketchpad SCNB
12892754 761.7 363.5 5964.0 ... SCNB
12892755 763.9 364.0 5966.0 ... SCNB
MSG 12892756 buffer1 SCNB
12892756 765.6 364.1 5970.0 ... SCNB
12892757 766.2 364.3 5972.0 ... SCNB
MSG 12892758 Diode1 SCNB
12892758 765.2 364.3 5973.0 ... SCNB
12892759 764.1 364.5 5964.0 ... SCNB
12892760 763.9 364.7 5955.0 ... SCNB
理想情况下,我希望为:
设置单独的列Trial ID (SCNB shown above)
Frame ID (PreBeep1_1st_Sketchpad above)
X-CoOr (757.0 above)
Y-CoOr (361.7 above)
Time (5916.0 above)
如果有帮助,分隔符在csv文件中。
可以看出,数据是从上到下依次逐行写入的,而不是按照我想要的形状组织成列。
' ...'也是实际值。
关于包含框架ID的列,例如' start_trial'和' PreBeep1_1st_Sketchpad'理想情况下,我希望在列中重复该帧的名称,直到遇到一个新的。
非常感谢任何帮助或建议。
编辑:输出应如下所示:
Trial ID Frame ID X-CoOr Y-CoOr Time
SCNB Start_Trial 757.0 361.7 5916.0
SCNB PreBeep1_1st_Sketchpad 756.7 361.7 5920.0
SCNB PreBeep1_1st_Sketchpad 756.1 362.2 5924.0
感谢您花时间阅读。
编辑:
以下是我正在使用的代码:
file2 = open('P1E2E_Both_New_trial_data.csv', 'rb')
Long_Format = open('P1E2E_Long_Format.csv', 'w')
writer1 = csv.writer(Long_Format, delimiter = '\t')
#First create column headings
columns = ["Trial ID"] + ['Frame ID'] + ['X-CoOr'] + ['Y-CoOr'] + ['Time']
writer1.writerow(columns)
reader1 = csv.reader(file2, delimiter = '\t')
for row in reader1:
# if statement here to skip blank lines
if len(row) > 1:
if 'start_trial' in row[1]:
label = [row[3]] + ['start_trial']
writer1.writerow(label)
file2.close() # <---IMPORTANT
Long_Format.close()
以上的输出是:
Trial ID Frame ID X-CoOr Y-CoOr Time
SCNB start_trial
RCL start_trial
SCR start_trial
......等等。
我的问题在于我不知道从哪里开始。即使是工作,我的方法也会非常低效。我不知道如何告诉python继续阅读标签&#39; Start_Trial&#39;之后的行。在if语句中,在所述标签之后的相应列中写入行[2]和行[3]中的x和y CoOr值。这有道理吗?
答案 0 :(得分:1)
如果我们假设所有行都有相同的删除计,那么这个问题并不像它看起来那么糟糕。
关键是要意识到所有的帧行都以键'MSG'
:
import csv
# Header values
FRAME_KEY = 'MSG'
FRAME_IDX = 0
TRIAL_ID_KEY = 'Trial ID'
TRIAL_ID_IDX = 3
FRAME_ID_KEY = 'Frame ID'
FRAME_ID_IDX = 2
# Data values
XCOR_KEY = 'X-CoOr'
XCOR_IDX = 1
YCOR_KEY = 'Y-CoOr'
YCOR_IDX = 2
TIME_KEY = 'Time'
TIME_IDX = 3
IN_DELIM = '\t'
OUT_DELIM= '\t'
OUT_HEADER = [TRIAL_ID_KEY, FRAME_ID_KEY, XCOR_KEY, YCOR_KEY, TIME_KEY]
with open('P1E2E_Both_New_trial_data.csv', 'rb') as in_file, open('P1E2E_Long_Format.csv') as out_file:
in_reader = csv.reader(in_file, delimeter = IN_DELIM)
out_writer= csv.DictWriter(out_file, OUT_HEADER, delimeter = OUT_DELIM)
out_writer.writeheader()
current_frame = None
current_trial = None
for row in in_reader:
if row[FRAME_IDX] == FRAME_KEY:
# Means we're at the start of a new frame
current_frame = row[FRAME_ID_IDX]
current_trial = row[TRIAL_ID_IDX]
else:
# Means we're in a data row
out_row = dict()
out_row[FRAME_ID_KEY] = current_frame
out_row[TRIAL_ID_KEY] = current_trial
out_row[XCOR_KEY] = row[XCOR_IDX]
out_row[YCOR_KEY] = row[YCOR_IDX]
out_row[TIME_KEY] = row[TIME_IDX]
out_writer.writerow(out_row)
基本上,当你使用'MSG'
键敲击一行时,你知道你正在开始一个新的框架。否则你写出数据。 DictWriter
可让您轻松自动执行此操作,而无需担心订单(订单由OUT_HEADER
定义)
答案 1 :(得分:0)
我已经调整了@aruisdante提交的答案。这是因为原始代码没有记录帧ID的每个实例。我在计算start_trial帧ID时注意到了这一点,但是它们没有达到已知的总数。
以下是修订后的代码:
FRAME_KEY = 'MSG'
FRAME_IDX = 0
FRAME_ID_KEY = 'Frame ID'
FRAME_ID_IDX = 1
TRIAL_ID_KEY = 'Trial ID'
TRIAL_ID_IDX = 2
# Data values
XCOR_KEY = 'X-CoOr'
XCOR_IDX = 1
YCOR_KEY = 'Y-CoOr'
YCOR_IDX = 2
TIME_KEY = 'Time'
TIME_IDX = 3
IN_DELIM = '\t'
OUT_DELIM= '\t'
OUT_HEADER = [TRIAL_ID_KEY, FRAME_ID_KEY, XCOR_KEY, YCOR_KEY, TIME_KEY]
currentframecount = 0
currentframecount1 = 0
out_row = dict()
with open('P1E2E_Both_New_trial_data.csv', 'rb') as in_file, open('P1E2E_Long_Format.csv', 'w') as out_file:
in_reader = csv.reader(in_file, delimiter = IN_DELIM)
out_writer= csv.DictWriter(out_file, OUT_HEADER, delimiter = OUT_DELIM)
out_writer.writeheader()
current_frame = None
current_trial = None
for row in in_reader:
if row[FRAME_IDX] == FRAME_KEY:
# Means we're at the start of a new frame
current_frame = row[FRAME_ID_IDX]
current_trial = row[TRIAL_ID_IDX]
#out_row[TRIAL_ID_KEY] = current_trial
#out_row[FRAME_ID_KEY] = current_frame
#out_writer.writerow(out_row)
#if 'start_trial' in current_frame:
# currentframecount += 1
# print currentframecount
# Here ensures that 'start_trail' labels are recorded
if 'start_trial' in row[FRAME_ID_IDX]:
out_row[FRAME_ID_KEY] = row[FRAME_ID_IDX]
out_writer.writerow(out_row)
else:
# Means we're in a data row
#Here write everything except 'start_trial' to ensure no repetition of this particular label
if 'start_trial' not in current_frame:
out_row[FRAME_ID_KEY] = current_frame # think this is pulling value from last if statement on current_frame
out_row[TRIAL_ID_KEY] = current_trial
out_row[XCOR_KEY] = row[XCOR_IDX]
out_row[YCOR_KEY] = row[YCOR_IDX]
out_row[TIME_KEY] = row[TIME_IDX]
out_writer.writerow(out_row)