Question

我有一个类似的文件：

chr1 1 61
chr1 2 61
chr1 3 62
chr1 4 63
... ... ...
chr1 5001 88
chr1 5002 90
... ... ...

我希望在5000bp窗口大小中获得最小的测序深度：

chr1 1 5000 58
chr1 5001 10000 62
chr1 10001 15000 34
... ... ... ...

我已经尝试过Python，并且可以解决：

from itertools import islice
import sys
bs = 5000
with open(sys.argv[1]) as f:
    repo = {}
    for li in f:
        chrom,pos,cov = li.split()
        pos,cov = int(pos),int(cov)
        if not chrom in repo.keys():
            repo[chrom] = []
            if pos != 1:
                for i in range(1, pos):
                    repo[chrom].append([i,0])
        repo[chrom].append([pos, cov])
    res = open("%s.txt" % sys.argv[2],"w+")
    for key in repo.keys():
        ll = iter(repo[key])
        while True:
            bin = list(islice(ll, bs))
            if bin:
                s,e,me,ma,mi = bin[0][0],\
                               bin[-1][0],\
                               sum([i[1] for i in bin])/bs,\
                               max(i[1] for i in bin),\
                               min(i[1] for i in bin)
                res.writelines("\t".join(map(str,[key,s,e,me,ma,mi]))+"\n")
            else:
                break
    res.close()

但是我想知道是否可以通过AWK轻松解决？感谢您的帮助！

使用awk从samtools深度结果中获得最小读取箱的深度

0 个答案: