我使用下面的脚本从fastq文件中提取分子条码。但是,我一直收到以下关键错误。
File "extractMolecularBarcode.py", line 42, in <module>
dicoBarcode[barcode] += 1
KeyError: '\n'
我知道关键错误意味着字典中没有定义某些内容,但我无法弄清楚问题。你能帮帮忙吗?非常感谢你!
这是脚本:
import sys, itertools
iFastq=open(sys.argv[1], 'r')
oFastq=open(sys.argv[2], 'w')
oBarcode=open(sys.argv[3], 'w')
oLigation=open(sys.argv[4], 'w')
dicoBarcode={}
dicoLigation={}
nct='ACTGN'
for barcode in list(itertools.product(nct, repeat=6)):
dicoBarcode["".join(barcode)] = 0
dicoLigation["".join(barcode)] = 0
header= iFastq.readline().rstrip()
while header != '':
totseq= iFastq.readline()
plus = iFastq.readline()
qual = iFastq.readline()
barcode = totseq[0:6]
ligation = totseq[3:9]
seq = totseq[6:]
oFastq.write(header.split(" ")[0]+'_MolecularBarcode:'+barcode+' '+header.split(" ")[1]+'\n')
oFastq.write(seq)
oFastq.write(plus)
oFastq.write(qual[6:])
header= iFastq.readline().rstrip()
dicoBarcode[barcode] += 1
if len(seq) >= 4 :
dicoLigation[ligation] += 1
for barcode, times in dicoBarcode.items():
oBarcode.write("%s\t%s\n" % (barcode, str(times)))
for ligation, times in dicoLigation.items():
oLigation.write("%s\t%s\n" % (ligation, str(times)))
答案 0 :(得分:1)
当您使用
时,文件中有换行符 dicoBarcode[barcode] += 1
条形码值是换行符或'\ n'会导致错误!
你可以通过提供默认值来克服它:
discoBarcode.get(barcode,YOURDEFAULT)
或者您可以先删除换行符然后处理文件;)
yourfile.readline().rstrip("\n")
答案 1 :(得分:0)
问题是您没有在rstrip()
值来自的实际行上调用barcode
。您需要编辑以下行:
totseq= iFastq.readline()
plus = iFastq.readline()
qual = iFastq.readline()
所以他们看起来像这样:
totseq= iFastq.readline().rstrip()
plus = iFastq.readline().rstrip()
qual = iFastq.readline().rstrip()