可能重复:
R: Finding patterns across multiple columns- possibly duplicated()?
亲爱的,
以下是我的数据集的一部分:
name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
63 uc003xlv.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
66 uc010lwg.1 chr8 38387812 38445509 - FLG
67 uc010lwh.1 chr8 38387812 38445509 - FLG
68 uc010lwj.1 chr8 38387812 38445509 - FLG
我想根据唯一的start,stop和alias列过滤数据集。最终结果必须是这样的:
name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
66 uc010lwg.1 chr8 38387812 38445509 - FLG
有人知道是否有解决方案吗? 谢谢!
答案 0 :(得分:7)
使用duplicated
功能:
复制数据:
x <- " name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
63 uc003xlv.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
66 uc010lwg.1 chr8 38387812 38445509 - FLG
67 uc010lwh.1 chr8 38387812 38445509 - FLG
68 uc010lwj.1 chr8 38387812 38445509 - FLG"
dat <- read.table(textConnection(x), header=TRUE)
删除重复项:
dat[!duplicated(dat[, c("start", "stop", "alias")]), ]
name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
答案 1 :(得分:1)
我认为您的示例输出有误,请尝试
dfrm$comb <- with(dfrm, paste(start,stop, alias, sep="+"))
dfrm[!duplicated(dfrm$comb), 1:6]
#---
name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8