R按列过滤数据帧

时间:2011-05-18 14:37:29

标签: r filter dataframe subset

我对以下数据框有疑问:

genes <- matrix(c("chr1","chr2","chr2","chr2","chr2","chr2",
              "uc001upw.2","uc001upw.2","uc001upw.2","uc001upx.1","uc001upy.1","uc001upz.1",
              "188001308","188001308","188001308","188037202","188037202","188037202",
              "188021266","188021266","188021266","188086618","188127464","188127464",
              "-","-","-","-","-","-",
              "CARCRL","CALCRL","CALCRL","TFPI","TFPI","TFPI", 
              "uc001upx.1","uc00upy.1","uc001upz.1","uc001upw.2","uc001upw.2","uc001upw.2",
              "188037202","188037202","188037202","188001308","188001308","188001308",
              "188086618","188127464","188127464","188021266","188021266","188021266",
              "-","-","-","-","-","-",
              "TFPI","TFPI","TFPI","CALCRL","CALCRL","CALCRL",
              "35894","35894","35894","35894","35894","35894"), nrow=6)

colnames(genes)<- c("chr","names.x","start.x","stop.x","strand.x","alias.x","name.y","start.y","stop.y","strand.y", "alias.y", "distance_startsite")
genes<-as.data.frame(genes)

在数据框中,您可以看到前三行在names.x和names.y中是唯一的。 第4,5和6行不是唯一的,它们仅以相反的方式显示。 我的问题是:有没有办法过滤这个?

谢谢! 萨曼莎

1 个答案:

答案 0 :(得分:1)

我确信这不是最漂亮的方式,但它完成了工作:

genes[!duplicated(t(apply(genes[,c('names.x','name.y')],1,sort))),]