在R中(尽管已经长途跋涉):
这是测试data.frame
public function postTest(Request $request){
return response()->json(['name'=> 'khaled','age'=>45]);
}
首先我制作GRanges对象:
df <- data.frame(
"CHR" = c(1,1,1,2,2),
"START" = c(100, 200, 300, 100, 400),
"STOP" = c(150,350,400,500,450)
)
然后我减少间隔以折叠成新的格兰奇对象:
gr <- GenomicRanges::GRanges(
seqnames = df$CHR,
ranges = IRanges(start = df$START, end = df$STOP)
)
现在在原始数据框中添加一个新列,以确认哪些行属于同一个连续的&#39;
。reduced <- reduce(gr)
输出:
subjectHits(findOverlaps(gr, reduced))
我如何在Python中执行此操作?我知道pybedtools,但据我所知,这需要我将data.frame保存到磁盘。任何帮助赞赏。
答案 0 :(得分:1)
看起来你正试图得到这些的交集。 Pybedtools将接受流作为输入。将您的数据读入一个床格式的字符串。
“CHR,启动,停止”
我从一个python字典开始并循环遍历它。
bed_string += "{0} {1} {2} {3} {0}|{1}|{2}|{3}\n".format(chrom, coord_start, coord_stop, aberration)
# Now create your bedtools.
breakpoint_bedtool = pybedtools.BedTool(bed_string, from_string=True)
target_bedtool = pybedtools.BedTool(self.args.Target_Bed_File, from_string=False)
# Find target intersects for printing.
breakpoint_target_intersect = breakpoint_bedtool.intersect(target_bedtool, wb=True, stream=True)
答案 1 :(得分:0)
https://github.com/biocore-ntnu/pyranges
import pyranges as pr
chromosomes = [1] * 3 + [2] * 2
starts = [100, 200, 300, 100, 400]
ends = [150, 350, 400, 500, 450]
gr = pr.PyRanges(chromosomes=chromosomes, starts=starts, ends=ends)
gr.cluster()
# +--------------+-----------+-----------+-----------+
# | Chromosome | Start | End | Cluster |
# | (int8) | (int32) | (int32) | (int64) |
# |--------------+-----------+-----------+-----------|
# | 1 | 100 | 150 | 1 |
# | 1 | 200 | 350 | 2 |
# | 1 | 300 | 400 | 2 |
# | 2 | 100 | 500 | 3 |
# | 2 | 400 | 450 | 3 |
# +--------------+-----------+-----------+-----------+
它将在0.0.21中输出。谢谢你的主意!