我有一个类似的文件:
chr1 1 61
chr1 2 61
chr1 3 62
chr1 4 63
... ... ...
chr1 5001 88
chr1 5002 90
... ... ...
我希望在5000bp窗口大小中获得最小的测序深度:
chr1 1 5000 58
chr1 5001 10000 62
chr1 10001 15000 34
... ... ... ...
我已经尝试过Python,并且可以解决:
from itertools import islice
import sys
bs = 5000
with open(sys.argv[1]) as f:
repo = {}
for li in f:
chrom,pos,cov = li.split()
pos,cov = int(pos),int(cov)
if not chrom in repo.keys():
repo[chrom] = []
if pos != 1:
for i in range(1, pos):
repo[chrom].append([i,0])
repo[chrom].append([pos, cov])
res = open("%s.txt" % sys.argv[2],"w+")
for key in repo.keys():
ll = iter(repo[key])
while True:
bin = list(islice(ll, bs))
if bin:
s,e,me,ma,mi = bin[0][0],\
bin[-1][0],\
sum([i[1] for i in bin])/bs,\
max(i[1] for i in bin),\
min(i[1] for i in bin)
res.writelines("\t".join(map(str,[key,s,e,me,ma,mi]))+"\n")
else:
break
res.close()
但是我想知道是否可以通过AWK轻松解决? 感谢您的帮助!