在使用BedTools的报道后,我正在生成我的数据结果的csv文件。但是,最终的数据帧将数据分成两行,而不是将其保留为一行。我曾尝试使用空格,逗号或制表符作为分隔符,但它仍然不会将其保留为单行,或将其拆分为所需的BED格式。真的很感激任何帮助!
输入的IMR和hESC文件如下所示:
track name=IMR90 description=IMR90 color=0,0,0
chr1 226253377 226573378 IMR90b_208
chr1 243133377 243333378 IMR90b_226
chr1 162493376 162533377 IMR90b_145
chr1 230533377 230773378 IMR90b_213
chr1 3610140 3770141 IMR90b_4
chr1 6077413 6277414 IMR90b_5
循环输入文件如下所示:
chr11 111240000 111280000 GM12878_replicate
chr14 24810000 24900000 GM12878_replicate
chr1 203250000 203290000 GM12878_replicate
chr12 50040000 50100000 GM12878_replicate
chr1 46510000 46640000 GM12878_replicate
chr1 23880000 23960000 GM12878_replicate
chr12 108970000 109010000 GM12878_replicate
chr8 11280000 11320000 GM12878_replicate
我的python代码:
from pybedtools import BedTool
#Read sorted IMR90 tad file
IMR90_tad = BedTool('IMR90_hg19_FINAL_W.txt').sort() # read in IMR90 tads
#Read sorted IMR90 tad file
hESC_tad = BedTool('hESC_hg19_FINAL_W.txt').sort() # read in hESC tads
#Read sorted loops file
loops = BedTool('all_loops_chr_.txt').sort()
#calculate coverage
coverage_IMR90_tad_cons = loops.coverage(IMR90_tad)
coverage_hESC_tad_cons = loops.coverage(hESC_tad)
# save as data-frames:
coverage_IMR90_tad_cons.to_dataframe().to_csv('coverage_IMR90_tad_cons', sep='\t')
coverage_hESC_tad_cons.to_dataframe().to_csv('coverage_hESC_tad_cons', sep='\t')
coverage_IMR90_tad_cons.to_dataframe().to_csv('cov_IMR90_tad_cons', sep='')
coverage_hESC_tad_cons.to_dataframe().to_csv('cov_hESC_tad_cons', sep='')
csv文件的外观:
chrom start end name score strand thickStart thickEnd
0 chr1 145048643 145368644 hESCb_192
1 23 3632 320001 0.01135
2 chr1 157013376 157093377 hESCb_207
3 10 1902 80001 0.0237747