我试图用基因组坐标计算两个数据集之间的重叠百分比,满足某些标准。
SEG2
ID chrom loc.start loc.end num.mark seg.mean
AB 1 3010000 173490000 8430 0.0039
AB 1 173510000 173590000 5 -17.738
AB 1 173610000 173830000 12 0.011
AB 1 173850000 173970000 6 -16.121
AB 2 3090000 181990000 8434 0.011
BB 12 3090000 68990000 2950 -0.2022
BB 12 69010000 87790000 889 0.0267
BB 12 88010000 98550000 507 -0.3337
BB 12 98570000 115090000 800 0.0586
BB 12 115110000 119350000 197 -0.2031
BB 12 119370000 119430000 4 -20.671
通过
chr start end CNA sample.ID
1 68580000 68640000 loss 1-68580000-68640000
3 15360000 16000000 loss 3-15360000-16000000
4 122660000 123500000 gain 4-122660000-123500000
7 48320000 48400000 loss 7-48320000-48400000
12 115860000 115980000 loss 12-115860000-115980000
12 113560000 114920000 gain 12-113560000-114920000
预期输出
ID chrom loc.start loc.end num.mark seg.mean lm(percentage of overlap)
AB 1 3010000 173490000 8430 0.0039 %
AB 1 173510000 173590000 5 -17.738
AB 1 173610000 173830000 12 0.011
AB 1 173850000 173970000 6 -16.121
AB 2 3090000 181990000 8434 0.011
BB 12 3090000 68990000 2950 -0.2022
BB 12 69010000 87790000 889 0.0267
BB 12 88010000 98550000 507 -0.3337
BB 12 98570000 115090000 800 0.0586
BB 12 115110000 119350000 197 -0.2031
BB 12 119370000 119430000 4 -20.671
我试过这个剧本,但它没有用。
for (i in 1:now(seg2)) {
seg2$lm <- if((seg2$chrom[i] == over$chr[i]) |
(seg2$loc.start[i] <= over$start[i] & seg2$loc.end[i] >= over$end[i]) |
(over$seg.mean[i] >= 0.459 & seg2$CNA[i] == "gain") |
(over$seg.mean[i] <= -0.678 & seg2$CNA[i] == "loss"),
(over$end[i]-over$start[i])/(seg2$loc.end[i]-seg2$loc.start[i])*100)
}
我知道GenomicRanges包,但感谢您的建议。
答案 0 :(得分:2)
我强烈建议您使用GenomicFeatures
来有效地执行此操作。如果您已经知道创建自己的Granges
对象,则需要执行以下两个步骤才能获得重叠长度
# to find overlaps
overlappin.index = findOverlaps(object1, object2)
# to get the overlap length
width(ranges(overlapping.index, ranges(object1),ranges(object2)))
其中,&#34; object1&#34;和&#34; object2&#34;是带有坐标的GRanges
个对象,&#34; overlappin.index&#34;是重叠的对象的索引。
一旦掌握了长度,就可以轻松获得百分比。