从数据框中删除具有交叉信息列的行,比较两列

时间:2012-11-16 10:58:33

标签: r dataframe specular

当我尝试根据配对基因条件合并不同的基因表达结果时,这是我特别做的噩梦之一,这是我的合并数据框:

knowngene1   Logfold1        Gene1   knowngene2   Logfold2        Gene2
uc001ezv.3  5.1167021111    NA  uc001ezu.1  5.6262305191    FLG
uc001ihe.4  4.1338871783    LOC100216001    uc001ihg.3  3.9475325801    NA
uc001iki.4  9.9902455211    CELF2   uc001ikn.2  9.3321964303    NA
uc001ikk.2  10.3059806111   CELF2   uc001ikn.2  9.3321964303    NA
uc001ikl.4  9.9890468379    CELF2   uc001ikn.2  9.3321964303    NA
uc001ikn.2  9.8293484977    NA  uc001iki.4  9.4401488053    CELF2
uc001ikn.2  9.8293484977    NA  uc001ikk.2  9.2887954663    CELF2
uc001ikn.2  9.8293484977    NA  uc001ikl.4  9.4401488053    CELF2
uc001ikn.2  9.8293484977    NA  uc010qbi.2  8.6399349792    CELF2
uc001ikn.2  9.8293484977    NA  uc010qbj.1  9.2887954663    CELF2
uc001ezu.1  5.6262305191    FLG uc001ezv.3  5.1167021111    NA
uc001ihg.3  3.9475325801    NA  uc001ihe.4  4.1338871783    LOC100216001
uc001iki.4  9.4401488053    CELF2   uc001ikn.2  9.8293484977    NA
uc001ikk.2  9.2887954663    CELF2   uc001ikn.2  9.8293484977    NA
uc001ikl.4  9.4401488053    CELF2   uc001ikn.2  9.8293484977    NA
uc001ikn.2  9.3321964303    NA  uc001iki.4  9.9902455211    CELF2
uc001ikn.2  9.3321964303    NA  uc001ikk.2  10.3059806111   CELF2
uc001ikn.2  9.3321964303    NA  uc001ikl.4  9.9890468379    CELF2
uc001ikn.2  9.3321964303    NA  uc010qbi.2  10.3865530025   CELF2
uc001ikn.2  9.3321964303    NA  uc010qbj.1  10.3072927485   CELF2
uc001iot.1  6.9068905956    NA  uc001iou.2  8.4040043896    VIM
uc001iou.2  10.4420548632   VIM uc001iot.1  5.8235197903    NA
uc001ipd.3  4.4693510978    ST8SIA6 uc001ipf.1  5.1931857169    NA
uc001kgd.3  3.5469561781    NA  uc009xts.3  4.0607448636    IFIT2
uc001kgf.3  3.3975573789    IFIT3   uc001kgd.3  3.2512633588    NA

关键是我想要删除不重复的行,当然没有,我想删除那些在knowngene1和knongene2中已经更改了knowngene accessor的行。让我举个例子,第一个是我要保留的行

uc001ikn.2  9.8293484977    NA  uc001iki.4  9.4401488053    CELF2

对我来说这些下一行是相同的,实际上第一行是我要保留的那个镜面图像,尽管它的表达式值或多或少都在同一范围内

uc001iki.4  9.4401488053    CELF2   uc001ikn.2  9.8293484977    NA
uc001ikn.2  9.3321964303    NA  uc001ikl.4  9.9890468379    CELF2

所以我的想法是只保留我看到的第一个并删除下一个。你有什么想法吗?

1 个答案:

答案 0 :(得分:1)

您要删除uc001ikn.2出现的所有行吗?如果是这样,我认为这将有效:

Rgames> foo
     [,1] [,2]
[1,]    1    7
[2,]    2    8
[3,]    3    9
[4,]    2    3
[5,]    4    1
[6,]    3   10
[7,]    5   11
[8,]    6   12
Rgames> foo[!duplicated(foo[,1])&!(foo[,2]%in%duplicated(foo[,1])),]
     [,1] [,2]
[1,]    1    7
[2,]    2    8
[3,]    3    9
[4,]    5   11
[5,]    6   12

在您的情况下,您可以使用df$knowngene1df$knowngene2列。