从数据框中删除具有与其他数据匹配的行

时间:2019-01-30 13:05:47

标签: r dataframe

我有两个数据框,例如:

gene_bacteriadf

 seqnames    ranges strand
  [1] scaffold_1      1-50      -
  [2] scaffold_1    60-100      -
  [3] scaffold_1   200-350      -
  [4] scaffold_2 1550-1650      +
  [5] scaffold_2 1900-2300      -
  [6] scaffold_5   250-255      +` 

overlapdf

seqnames    ranges strand hit with_busco with_bacteria Overlap_with 
scaffold_2 1550-1650      + |      TRUE       101        201        101 0.502487562189055  

的想法只是删除列seqnames,range和strand中的匹配项。 我试过了;

genes_bacteriadf[!(alist(genes_bacteriadf$seqnames, genes_bacteriadf$start, genes_bacteriaf$end, genes_bacteriadf$width) %in% (alistoverlapsdf$seqnames,overlapsdf$start,overlapsdf$end,overlapsdf$width), ]

但是id不起作用。

示例scaffold2中的1550165à确实匹配,所以我应该得到一个新的df,例如:

seqnames    ranges strand

  [1] scaffold_1      1-50      -
  [2] scaffold_1    60-100      -
  [3] scaffold_1   200-350      -
  [5] scaffold_2 1900-2300      -
  [6] scaffold_5   250-255      +

有人有想法吗?

1 个答案:

答案 0 :(得分:1)

这需要dplyr的anti_join,尤其是列名相同的情况。

library(dplyr)

gene_bacteriadf %>% 
  anti_join(overlapdf)

Joining, by = c("seqnames", "ranges", "strand")
    seqnames    ranges strand
1 scaffold_1      1-50      -
2 scaffold_1    60-100      -
3 scaffold_1   200-350      -
4 scaffold_2 1900-2300      -
5 scaffold_5   250-255      +