我在下面有一个文件,我希望将每四行写入的内容转换为数字。
sample.fastq
@HISE
GGATCGCAATGGGTA
+
CC@!$%*&J#':AAA
@HISE
ATCGATCGATCGATA
+
()**D12EFHI@$;;
每个第四行是一系列字符,每个字符单独等于一个数字(存储在字典中)。我想将每个字符转换为相应的数字,然后找到该行上所有这些数字的平均值。
我已经能够单独显示每个角色,但是我很惊讶如何用他们的数字替换角色然后继续下去。
script.py
d = {
'!':0, '"':1, '#':2, '$':3, '%':4, '&':5, '\'':6, '(':7, ')':8,
'*':9, '+':10, ',':11, '-':12, '.':13, '/':14, '0':15,'1':16,
'2':17, '3':18, '4':19, '5':20, '6':21, '7':22, '8':23, '9':24,
':':25, ';':26, '<':27, '=':28, '>':29, '?':30, '@':31, 'A':32, 'B':33,
'C':34, 'D':35, 'E':36, 'F':37, 'G':38, 'H':39, 'I':40, 'J':41 }
with open('sample.fastq') as fin:
for i in fin.readlines()[3::4]:
for j in i:
print j
输出应如下所示并存储在新文件中。
output.txt的
@HISE
GGATCGCAATGGGTA
+
19 #From 34 34 31 0 3 4 9 5 41 2 6 25 32 32 32
@HISE
ATCGATCGATCGATA
+
23 #From 7 8 9 9 35 16 17 36 37 39 40 31 3 26 26
我提议的可能吗?
答案 0 :(得分:1)
您可以在输入文件行上使用for循环执行此操作:
with open('sample.fastq') as fin, open('outfile.fastq', "w") as outf:
for i, line in enumerate(fin):
if i % 4 == 3: # only change every fourth line
# don't forget to do line[:-1] to get rid of newline
qualities = [d[ch] for ch in line[:-1]]
# take the average quality score. Note that as in your example,
# this truncates each to an integer
average = sum(qualities) / len(qualities)
# new version; average with \n at end
line = str(average) + "\n"
# write line (or new version thereof)
outf.write(line)
这会产生您请求的输出:
@HISE
GGATCGCAATGGGTA
+
19
@HISE
ATCGATCGATCGATA
+
22
答案 1 :(得分:0)
假设您从stdin
读取并写信至stdout
:
for i, line in enumerate(stdin, 1):
line = line[:-1] # Remove newline
if i % 4 != 0:
print(line)
continue
nums = [d[c] for c in line]
print(sum(nums) / float(len(nums)))