我的数据看起来像这样:
-HI5UHB101EPGLJ rank=0000024 x=1813.0 y=437.0 length=81
ACGTAGATCGTGTAGCTGAGGATGTTGACAACCATGTGGACAGAGCCTCACCATCAACAT
CCTCAGCTACACGATCTGCGT
-HI5UHB101BDVPE rank=0000032 x=451.5 y=48.0 length=73
ACGTAGATCGTCTTGAGTGATTACAGATCTAATACAATGTGCAGTCTAGCTAGATGTTAT
TCTATATATATAC
-HI5UHB101AL8KC rank=0000049 x=136.0 y=586.0 length=58
ACGTAGATCGTCTCGGCTAGTAGACGAGCCATCGTCTACTAGCCGAGACGATCTGCGT
如何将其制作成如下所示的csv表:
'HI5UHB101EPGLJ', 'rank=0000024', 'x=1813.0', 'y=437.0', 'length=81','ACGTAGATCGTGTAGCTGAGGATGTTGACAACCATGTGGACAGAGCCTCACCATCAACATCCTCAGCTACACGATCTGCGT'
'HI5UHB101BDVPE', 'rank=0000032', 'x=451.5', 'y=48.0', 'length=73', 'ACGTAGATCGTCTTGAGTGATTACAGATCTAATACAATGTGCAGTCTAGCTAGATGTTATTCTATATATATAC'
'HI5UHB101AL8KC', 'rank=0000049', 'x=136.0', 'y=586.0', 'length=58', 'ACGTAGATCGTCTCGGCTAGTAGACGAGCCATCGTCTACTAGCCGAGACGATCTGCGT'
我的主要问题是在“长度”之后有一个换行符号(\n
),然后当我需要它们加入时,字母序列本身之间会有新行(\n
)。
字母序列的长度不同,导致序列行数可变。
任何帮助将不胜感激。这将在一个巨大的文件上运行。
答案 0 :(得分:4)
使用生成器函数通过起始-
:
def per_section(iterable):
row = []
for line in iterable:
if line.startswith('-'):
if row:
yield row
row = line[1:].split() + ['']
else:
row[-1] += line.strip()
if row:
yield row
这会产生完整的重组部分,准备写入CSV。
with open(inputfile) as infile, open(outputfile, 'wb') as outfile:
csvwriter = csv.writer(outfile)
csvwriter.writerows(per_section(infile))
对于您的样本输入,它会产生:
HI5UHB101EPGLJ,rank=0000024,x=1813.0,y=437.0,length=81,ACGTAGATCGTGTAGCTGAGGATGTTGACAACCATGTGGACAGAGCCTCACCATCAACATCCTCAGCTACACGATCTGCGT
HI5UHB101BDVPE,rank=0000032,x=451.5,y=48.0,length=73,ACGTAGATCGTCTTGAGTGATTACAGATCTAATACAATGTGCAGTCTAGCTAGATGTTATTCTATATATATAC
HI5UHB101AL8KC,rank=0000049,x=136.0,y=586.0,length=58,ACGTAGATCGTCTCGGCTAGTAGACGAGCCATCGTCTACTAGCCGAGACGATCTGCGT
答案 1 :(得分:0)
这样的事情应该有用......
f= open("data.txt")
fo = open("done.txt","w")
line = f.readline()
while(len(line) > 0):
#print line
if (line[0] == '-'):
label, rank, xval, yval, lenval = line.split(" ")
lenval = lenval.strip('\n')
#print label,rank, xval,yval,lenval
line2 = f.readline()
code = line2
line2 = f.readline()
while( len(line2.strip()) > 0):
code = code+line2.strip()
line2 = f.readline()
#print code
lineout = "'"+label[1:]+"', '"+rank+"', '"+xval+","+yval+"', '"+lenval+"', "
lineout = lineout+"'"+code+"'\n"
fo.write(lineout)
line = f.readline()
f.close()
fo.close()