Question

我在下面有一个文件，我希望将每四行写入的内容转换为数字。

sample.fastq

@HISE
GGATCGCAATGGGTA
+
CC@!$%*&J#':AAA
@HISE
ATCGATCGATCGATA
+
()**D12EFHI@$;;

每个第四行是一系列字符，每个字符单独等于一个数字（存储在字典中）。我想将每个字符转换为相应的数字，然后找到该行上所有这些数字的平均值。

我已经能够单独显示每个角色，但是我很惊讶如何用他们的数字替换角色然后继续下去。

script.py

d = {
'!':0, '"':1, '#':2, '$':3, '%':4, '&':5, '\'':6, '(':7, ')':8,
'*':9, '+':10, ',':11, '-':12, '.':13, '/':14, '0':15,'1':16,
'2':17, '3':18, '4':19, '5':20, '6':21, '7':22, '8':23, '9':24,
':':25, ';':26, '<':27, '=':28, '>':29, '?':30, '@':31, 'A':32, 'B':33,
'C':34, 'D':35, 'E':36, 'F':37, 'G':38, 'H':39, 'I':40, 'J':41 }


with open('sample.fastq') as fin:
    for i in fin.readlines()[3::4]:
            for j in i:
                    print j

输出应如下所示并存储在新文件中。

output.txt的

@HISE
GGATCGCAATGGGTA
+
19 #From 34 34 31 0 3 4 9 5 41 2 6 25 32 32 32
@HISE
ATCGATCGATCGATA
+
23 #From 7 8 9 9 35 16 17 36 37 39 40 31 3 26 26

我提议的可能吗？

Answer 1

您可以在输入文件行上使用for循环执行此操作：

with open('sample.fastq') as fin, open('outfile.fastq', "w") as outf:
    for i, line in enumerate(fin):
        if i % 4 == 3:  # only change every fourth line
            # don't forget to do line[:-1] to get rid of newline
            qualities = [d[ch] for ch in line[:-1]]
            # take the average quality score. Note that as in your example,
            # this truncates each to an integer
            average = sum(qualities) / len(qualities)
            # new version; average with \n at end
            line = str(average) + "\n"

        # write line (or new version thereof)
        outf.write(line)

这会产生您请求的输出：

@HISE
GGATCGCAATGGGTA
+
19
@HISE
ATCGATCGATCGATA
+
22

Answer 2

假设您从stdin读取并写信至stdout：

for i, line in enumerate(stdin, 1):
    line = line[:-1]  # Remove newline
    if i % 4 != 0:
        print(line)
        continue
    nums = [d[c] for c in line]
    print(sum(nums) / float(len(nums)))

使用字典读取和替换行的内容

2 个答案: