Question

我一直在使用pytabix来读取.gz文件。我不确定我的代码有什么问题，因为它之前有效：

import tabix

tb = tabix.open('qcat.gz') 
coor = "chr10:6000-30000"
record = tb.querys(coor)                                                   
for res in record:        
    print res

我一直收到这个错误：

tabix.TabixError: query failed

它似乎不像是一个.tbi索引文件。

Answer 1

你必须在查询实际执行之前运行tabix（bgzip，index）的先决条件。它们不包含在tb.query中。

如果您的文件已经压缩，则应该执行以下操作：

zippedf ='qcat.gz'
def tabix_index(zippedf)
    from subprocess import Popen,PIPE
    import shlex
    p = Popen(['tabix','-f', zippedf], stdout= PIPE)
    # or : cmd = "tabix -f " + zippedf
    # p = Popen(shlex.split(cmd), stdout=PIPE) 
    #(shlex splits the cmd in spaces)
    p.wait()

如果你有一个非压缩文件，你可以连续运行3个子进程来进行排序，bgzip和index：

out_sorted = 'myfile.sorted'
out_zipped= out_sorted + ".gz"

with open(out_zipped,'w') as sort_zip_out :
     cmd="sort -V  -k1,1   myfile"
     p1 = Popen( shlex.split(cmd), stdout=PIPE )
     p2 = Popen(['bgzip','-c','-f'], stdin=p1.stdout, stdout= sort_zip_out)
     p1.stdout.close()  #finish first subprocess before starting second
     p1.wait()  #wait for results to be written

#when these two subprocesses are finished, 
tabix_index(out_zipped)

tabix.TabixError：查询失败

1 个答案: