我有要解析的fastq文件。下面显示了1' read'每个文件中有数千个:
@PSI179204_0037:4:1:2139:945#0/2
AGAGATCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGCGAGCCTGAACCAGCCAAGTAGCGTGAGGGACGACTGCCCTACGGGTTGTAAACCTCTTTTGTTCGGGAATAAAGTGCGGCACGCGTGCCGGTTTGTATGTCCCGTTCGAATAG
+PSI179204_0037:4:1:2139:945#0/2
ghhhhhhhhhhhfhdhhhfhhhhhgeeghhhdghfgheh[hhfhfhhhhehghffcahhhhfgcfgeaegd_ah_aaOa[a[aW___W^`a`b`da`ZXO]N^``BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB^C
我的目标是让他们在字典中,如下所示,每一行都缩短了:
{' @PSI179204_0037:4': 'AGAGATCCTACG' '+PSI179204_0' 'ghhhhhhhhh' }
我在这里看到你可以将一行描述为一个键,然后使用next(filename)命令将下一行用作值,所以试图使用它但是有3个next(filename)条目,如下所示代码如下:
file1 = open(inputfile1, 'r')
file2 = open(inputfile2, 'r')
File1dict = {}
File2dict = {}
for key in file1:
File1dict[key.strip()] = next(file1) = next(file1) = next(file1)
print (File1dict)
for key in file2:
File2dict[key.strip()] = next(file2) = next(file2) = next(file2)
print (File2dict)
目前我收到以下错误:
File "Dict_maybesworking.py", line 31
File1dict[key.strip()] = next(file1) = next(file1) = next(file1)
SyntaxError: can't assign to function call
有没有人知道如何使这段代码有效,如果不是另类呢?
以下整个脚本:
from __future__ import print_function
from collections import defaultdict
from itertools import groupby
import argparse
from itertools import izip
parser = argparse.ArgumentParser() #simplifys the wording of using argparse as stated in the python tutorial
parser.add_argument("-r1", type=str, action='store', dest='input1', help="input the forward read file") # allows input of the forward read
parser.add_argument("-r2", type=str, action='store', dest='input2', help="input the reverse read file") # allows input of the reverse read
parser.add_argument("-v", "--verbose", action="store_true", help=" Increases the output, only needs to be used to provide feedback for debugging")
parser.add_argument("-u", type=str, action='store', dest='unique', help="Unique insturment number for fastq file required") # allows input of the reverse read
parser.add_argument("-n", action="count", default=0, help="Allows for up to 5 mismatches, Default is 0")
parser.add_argument("-o", "--output", help="Directs the output to a name of your choice")
args = parser.parse_args()
Uni = str(args.unique)
inputfile1 = str(args.input1)
inputfile2 = str(args.input2)
output = str(args.output)
output_file= open(output, "w")
Unmatched_1 = open('Unmatched_1', "a")
Unmatched_2 = open('Unmatched_2', "a")
file1 = open(inputfile1, 'r')
file2 = open(inputfile2, 'r')
File1dict = {}
File2dict = {}
for key in file1:
File2dict[key.strip()] = [file2.next(), file2.next(), file2.next()]
print (File1dict)
for key in file2:
File2dict[key.strip()] = [file2.next(), file2.next(), file2.next()]
print (File2dict)
命令行使用:
python Dict_maybesworking.py -r1 Real_test_1 -r2 Real_test_2 -u PSI179204 -o file_result
答案 0 :(得分:3)
由于文件对象是可迭代的,您可以像现在一样迭代获取密钥,然后在接下来的3次出现时从同一个迭代中切片以获取值,例如:
from itertools import islice
with open('file1') as fin:
stripped_lines = (line.strip() for line in fin)
f1dict = {key: list(islice(stripped_lines, 3)) for key in stripped_lines}
请注意,for line in fin
一次消耗一行,但list(islice(fin, 3))
会消耗fin
中的3行,以便下一个for
消耗该行后那等等。
例如:
>>> from itertools import islice
>>> r = range(20)
>>> i = iter(r)
>>> {key: list(islice(i, 3)) for key in i}
{0: [1, 2, 3], 8: [9, 10, 11], 4: [5, 6, 7], 12: [13, 14, 15], 16: [17, 18, 19]}
答案 1 :(得分:0)
您没有正确使用下一个方法,您必须将新值放入列表或其中 其他数据结构:
File1dict[key.strip()] = [file1.next(), file1.next(), file1.next()]
....
File2dict[key.strip()] = [file2.next(), file2.next(), file2.next()]