我有一个wig文件,我使用我编写的调用rtracklayer
包的函数读入类似格陵兰的对象:
read_wig <- function(x, format='wig', genome='mm9') {
suppressMessages(library(rtracklayer))
merged_wig <- import.wig(x, format=format, genome=genome)
merged_wig <- keepSeqlevels(merged_wig, paste0('chr', c(seq(1,19), 'X', 'Y')), pruning.mode="coarse")
return(merged_wig)
}
wig <- read_wig('~/path/to/wig')
上面的代码返回:
> wig
UCSC track 'MEFES_K27AC.downsampled.sorted'
UCSCData object with 13274466 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [ 1, 200] * | 1
[2] chr1 [201, 400] * | 2
[3] chr1 [401, 600] * | 3
[4] chr1 [601, 800] * | 4
[5] chr1 [801, 1000] * | 0
... ... ... ... . ...
[13274462] chrY [15901401, 15901600] * | 0
[13274463] chrY [15901601, 15901800] * | 0
[13274464] chrY [15901801, 15902000] * | 0
[13274465] chrY [15902001, 15902200] * | 0
[13274466] chrY [15902201, 15902400] * | 0
-------
seqinfo: 21 sequences from mm9 genome
现在使用此对象我想计算对象中每行的每个范围内的窗口内的分数总和。例如,我想计算范围1-10000(本例中为123)之间的分数总和,并将此条目添加为分数旁边的列。我想为每一行做这个。
> expected_output
UCSC track 'MEFES_K27AC.downsampled.sorted'
UCSCData object with 13274466 ranges and 1 metadata column:
seqnames ranges strand | score score_10000
<Rle> <IRanges> <Rle> | <numeric> <numeric>
[1] chr1 [ 1, 200] * | 1 123
[2] chr1 [201, 400] * | 2 ...
[3] chr1 [401, 600] * | 3 ...
[4] chr1 [601, 800] * | 4 ...
[5] chr1 [801, 1000] * | 0 ...
... ... ... ... . ...
[13274462] chrY [15901401, 15901600] * | 0 ...
[13274463] chrY [15901601, 15901800] * | 0 ...
[13274464] chrY [15901801, 15902000] * | 0 ...
[13274465] chrY [15902001, 15902200] * | 0 ...
[13274466] chrY [15902201, 15902400] * | 0 ...
-------
seqinfo: 21 sequences from mm9 genome
理想情况下,我想添加计算得分范围从1-10000,1-20000,1-30000等的列到100000。
任何帮助将不胜感激!
编辑:
可以找到假发文件 here 。