我使用以下代码根据第二和第三列对数据帧进行了排序:
EXP[rev(order(EXP$1, EXP$2)),]
其中EXP是数据框的名称。
现在,我只需要根据第二列排序的每个代表标识符的第一行。 R中最好的方法是什么?
数据结构如下:
A_1784709 10007 0.40446362
B_2329958 10006 0.22501015
A_1739081 10006 0.10621801
B_1679600 10005 0.51709792
A_1770963 10004 0.21095531
A_2067520 100033416 0.08301735
A_1740024 10003 0.40881969
B_1751882 10002 0.09964711
A_1667906 10002 0.08826233
B_1791916 10002 0.08408508
A_1775734 10044 0.28613624
B_1674440 10044 0.16204336
B_2321648 10044 0.15484888
B_1654543 10001 0.27293547
B_1733559 100008589 1.03071504
A_2325610 10000 0.29913509
A_1733598 10000 0.14406499
B_1757130 10000 0.12600686
A_1779228 1000 0.37764131
A_1803686 100 0.62712817
A_1670903 10 0.09947230
我需要这样的结果:
A_1784709 10007 0.40446362
B_2329958 10006 0.22501015
B_1679600 10005 0.51709792
A_1770963 10004 0.21095531
A_2067520 100033416 0.08301735
A_1740024 10003 0.40881969
B_1751882 10002 0.09964711
A_1775734 10044 0.28613624
B_1654543 10001 0.27293547
B_1733559 100008589 1.03071504
A_2325610 10000 0.29913509
A_1779228 1000 0.37764131
A_1803686 100 0.62712817
A_1670903 10 0.09947230
答案 0 :(得分:1)
我们可以在这里使用duplicated()
否定(!
):
> EXP[!duplicated(EXP[,2]),]
V1 V2 V3
1 A_1784709 10007 0.40446362
2 B_2329958 10006 0.22501015
4 B_1679600 10005 0.51709792
5 A_1770963 10004 0.21095531
6 A_2067520 100033416 0.08301735
7 A_1740024 10003 0.40881969
8 B_1751882 10002 0.09964711
11 A_1775734 10044 0.28613624
14 B_1654543 10001 0.27293547
15 B_1733559 100008589 1.03071504
16 A_2325610 10000 0.29913509
19 A_1779228 1000 0.37764131
20 A_1803686 100 0.62712817
21 A_1670903 10 0.09947230
数据强>
EXP <- structure(list(V1 = structure(c(9L, 21L, 4L, 15L, 6L, 11L, 5L,
17L, 1L, 19L, 7L, 14L, 20L, 13L, 16L, 12L, 3L, 18L, 8L, 10L, 2L),
.Label = c("A_1667906", "A_1670903", "A_1733598", "A_1739081",
"A_1740024", "A_1770963", "A_1775734", "A_1779228", "A_1784709",
"A_1803686", "A_2067520", "A_2325610", "B_1654543", "B_1674440",
"B_1679600", "B_1733559", "B_1751882", "B_1757130", "B_1791916",
"B_2321648", "B_2329958"), class = "factor"), V2 = c(10007L,
10006L, 10006L, 10005L, 10004L, 100033416L, 10003L, 10002L, 10002L,
10002L, 10044L, 10044L, 10044L, 10001L, 100008589L, 10000L, 10000L,
10000L, 1000L, 100L, 10L), V3 = c(0.40446362, 0.22501015, 0.10621801,
0.51709792, 0.21095531, 0.08301735, 0.40881969, 0.09964711, 0.08826233,
0.08408508, 0.28613624, 0.16204336, 0.15484888, 0.27293547, 1.03071504,
0.29913509, 0.14406499, 0.12600686, 0.37764131, 0.62712817, 0.0994723)),
.Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -21L))
答案 1 :(得分:1)
假设您有一个包含1,2,3的额外列 和一个重复的行
EXP$V4 <- c(rep(c(1,2,3),nrow(EXP)/3))
EXP <- rbind(EXP,data.frame(V1="B_1791916",V2=10002,V3=0.08408508,V4=3))
并且您希望来自第2列和此额外列的非重复行 !复制会给你这个:
EXP[!duplicated(EXP[,2]) & !duplicated(EXP[,4]),]
V1 V2 V3 V4
1 A_1784709 10007 0.4044636 1
2 B_2329958 10006 0.2250101 2
而unique()为您提供以下内容
unique(EXP[c("V4", "V2")])
V4 V2
1 1 10007
2 2 10006
3 3 10006
4 1 10005
5 2 10004
6 3 100033416
7 1 10003
8 2 10002
9 3 10002
10 1 10002
11 2 10044
12 3 10044
13 1 10044
14 2 10001
15 3 100008589
16 1 10000
17 2 10000
18 3 10000
19 1 1000
20 2 100
21 3 10
unique()允许两列不重复。然而,duplicated()擅长检测重复的观察结果。