所以我有一堆制表符分隔的数据文件,如下所示:
Subject Phase Condition Trial Trial Type Target Loc TargetID DistID Digit1 Digit2 Accuracy-T RT-P RT-T
2 1 9 1 cong bottom S H I F T S H I F T 7 2 1 742.69104 681.4379692
2 1 9 2 cong top P A S T E P A S T E 2 3 1 699.4130611 454.8609257
2 1 9 3 incong top S U G A R Y O U T H 6 5 1 979.2759418 31.06093407
2 1 9 4 incong top C H E E K G R O A N 4 8 1 1025.339842 31.55088425
2 1 9 5 incong bottom S T A L K L E A V E 7 9 1 555.9248924 479.6338081
2 1 9 6 incong top B R A I N F I E L D 4 5 2 976.7041206 31.50486946
2 1 9 7 incong bottom C R O W N P L A T E 5 7 1 0 32.24992752
2 1 9 8 cong top S T A N D S T A N D 7 6 1 1092.888117 31.59618378
2 1 9 9 cong bottom R O U T E R O U T E 4 8 1 883.2840919 31.32796288
2 1 9 10 cong top F L O A T F L O A T 5 6 1 768.682003
我想要做的是从文件中删除值为' 2'的任何行。或者' 3'根据' Accuracy-T'标题(对不起,他们错误地分配了它 - 它是第10个值)。
所以基本的想法是一个python脚本,它在多个文件上迭代这个函数(在这里看作' studyfile')并吐出一个新的制表符分隔文本文件,删除这些项目(在这里看作&# 39; goodstudyfile&#39)。所以我想出了这个:
GroupVar=['1','2']
SubjectVar=['1','2']
CondVar=['1','2','3','4','5','6','7','8','9','10','11','12']
for group in GroupVar:
for subject in SubjectVar:
for condition in CondVar:
studyfile_name = '*/Pruning/Study 126/Group_'+str(group)+'_Subject_'+str(subject)+'_Condition_'+str(condition)+'_phase_1.txt'
studyfile = open(studyfile_name,'r')
goodstudyfile_name = '*/Pruning/Study 126/Phase 1 No Errors/Group_'+str(group)+'_Subject_'+str(subject)+'_Condition_'+str(condition)+'_phase_1_Fixed.txt'
goodstudyfile = open(goodstudyfile_name,'w')
study_lines = studyfile.readlines()
studyfile.close()
first_block = study_lines[4].split('\t')[1].strip()
NR_errors_removed = 0
R_errors_removed = 0
spoils_removed = 0
low_cutoff_spoils = 0
for study_line in study_lines:
if len(study_line.split('\t')) > 2:
if study_line.split('\t')[10] == '2':
if study_line.split('\t')[4] == 'incong':
study_lines.remove(study_line)
NR_errors_removed+=1
elif study_line.split('\t')[4] == 'cong':
study_lines.remove(study_line)
R_errors_removed+=1
elif study_line.split('\t')[10] == '3':
study_lines.remove(study_line)
spoils_removed+=1
else:
for study_line in study_lines[1:]:
if int(float(study_line.split('\t')[12][:8])) < 100.00:
study_lines.remove(study_line)
low_cutoff_spoils+=1
print 'Group:' + str(group) + ' Subject:' + str(subject) + ' Condition:' + str(condition)
print 'NR Errors:'+ str(NR_errors_removed)
print 'R Errors:'+ str(R_errors_removed)
print 'Spoils:'+ str(spoils_removed)
print 'low cutoff Spoils:'+ str(low_cutoff_spoils)
goodstudyfile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\n'.format(NR_errors_removed, 'NR errors removed', R_errors_removed, 'R errors removed',spoils_removed, 'spoils removed',low_cutoff_spoils, 'low cutoff spoils'))
goodstudyfile.write('{}\n'.format(first_block))
for line in study_lines:
goodstudyfile.write(line)
goodstudyfile.close()
所以这在我的所有文件中都很好地迭代(基于组,主题和condvar组合的所有可能排列的48个文件),但由于某种原因它经常错过应该删除的行。所以在所谓的“固定”中文件,我还有一堆应该删除的行。
我做的任何事情似乎都无法解决甚至改变结果 - 错过的行总是一致的(即,尽管第7行被标记为&#39; 2),它总是会错过Group2_Subject1_Condition_6的第7行。有人能告诉我哪里出错了吗?
以及这里缺少的一条线的例子:
Subject Phase Condition Trial Trial Type Target Loc TargetID DistID Digit1 Digit2 Accuracy-T RT-P RT-T
1 1 6 25 incong top V A L U E G U I D E 9 7 2 304.780960083 866.713047028
这应该由python脚本修剪,因为它的值为&#39; 2&#39;在Accuracy-T
下