基于具有多个条件的另一数据帧来子集数据帧

时间:2017-11-24 07:36:46

标签: r dataframe

我有一个甲基化数组数据框列表,如下所示,名为betatable

sample_A sample_B ... chr    position
0.5      0.3          chr1   75939
0.3      0.6          chr2   11195
...

我希望通过chr和位置范围的特定条件对上述数据帧进行子集化并生成另一个数据帧。 为此,我有另一组数据genes_pos

gene   chr    range_lower   range_upper
ABC    chr1   34959         69593
...

我在考虑使用lapply,但无法弄明白。 非常感谢提前。

1 个答案:

答案 0 :(得分:0)

一种方法是使用非等连接

但是,需要准备由now deleted post中的OP提供的样本数据集,因为这些位置是作为因子而不是整数给出的

library(data.table)
# prepare data
setDT(betatable, keep.rownames = "sample.id")
setDT(gene_pos)
# coerce positions from factor to integer
betatable[, pos := as.integer(as.character(pos))]
cols <- c("lower", "upper")
gene_pos[, (cols) := lapply(.SD, function(x) as.integer(as.character(x))), .SDcols = cols]

# non-equi join
betatable[gene_pos, on = .(chr, pos >= lower, pos <= upper), gene := i.gene][!is.na(gene)]
   sample.id probe  chr pos  gene
1:  sample_a   111 chr1 335 geneA
2:  sample_c   200 chr2 221 geneB
3:  sample_e   228 chr2 230 geneC

OP提供的数据

column <-c("probe","chr","pos")
sample_a <- c("111","chr1","335")
sample_b <- c("115","chr1","380")
sample_c <- c("200","chr2","221")
sample_d <- c("222","chr2","226")
sample_e <- c("228","chr2","230")
betatable <-data.frame(rbind(sample_a,sample_b,sample_c,sample_d,sample_e))
colnames(betatable)<- column

gene_A <- c("geneA","chr1", "120","336")
gene_B <- c("geneB","chr2", "200","222")
gene_C <- c("geneC","chr2", "227","231")
gene_pos <- rbind(gene_A,gene_B,gene_C)
gene_pos <- data.frame(rbind(gene_A,gene_B,gene_C))
colnames(gene_pos)<-c("gene","chr","lower","upper")

betatable
         probe  chr pos
sample_a   111 chr1 335
sample_b   115 chr1 380
sample_c   200 chr2 221
sample_d   222 chr2 226
sample_e   228 chr2 230
gene_pos
        gene  chr lower upper
gene_A geneA chr1   120   336
gene_B geneB chr2   200   222
gene_C geneC chr2   227   231