我有一个结果数据框。 Cruise_Strata有多种比较。我有两列cruise_strata(Cruise1_Strata1和Cruise2_Strata2)。我发现的问题是数据框中有“重复”记录。例如,一行将具有
Cruise_Strata1 Cruise_Strata2
201501.35 201502.35
,另一行将有
Cruise_Strata1 Cruise_Strata2
201502.35 201501.35
对于其余列,行具有相同的结果。我希望能够识别发生这种情况的行并从数据集中删除一行,但不知道如何去做。我不能使用复制品,因为它们不是重复的。
任何帮助将不胜感激。
这是数据帧。
dput(result5)
structure(list(Cruise_Strata1 = structure(c(1L, 1L, 2L, 2L, 3L,
3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L,
11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L,
17L, 18L, 18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L, 23L, 23L,
24L, 24L, 25L, 25L, 26L, 26L, 27L, 27L, 28L, 28L, 29L, 29L, 30L,
30L, 31L, 31L, 32L, 32L, 33L, 33L, 34L, 34L, 35L, 35L, 36L, 36L,
37L, 37L, 38L, 38L, 39L, 39L, 40L, 40L, 41L, 41L, 42L, 42L, 43L,
43L, 44L, 44L, 45L, 45L, 46L, 46L, 47L, 47L, 48L, 48L, 49L, 49L,
50L, 50L, 51L, 51L, 52L, 52L, 53L, 53L, 54L, 54L, 55L, 55L, 56L,
56L, 57L, 57L, 58L, 58L, 59L, 59L, 60L, 60L, 61L, 61L, 62L, 62L,
63L, 63L, 64L, 64L, 65L, 65L, 66L, 66L), .Label = c("201501.10",
"201501.11", "201501.13", "201501.14", "201501.15", "201501.17",
"201501.18", "201501.19", "201501.21", "201501.22", "201501.23",
"201501.24", "201501.25", "201501.26", "201501.27", "201501.29",
"201501.30", "201501.31", "201501.33", "201501.34", "201501.35",
"201501.9", "201502.10", "201502.11", "201502.13", "201502.14",
"201502.15", "201502.17", "201502.18", "201502.19", "201502.21",
"201502.22", "201502.23", "201502.24", "201502.25", "201502.26",
"201502.27", "201502.29", "201502.30", "201502.31", "201502.33",
"201502.34", "201502.35", "201502.9", "201503.10", "201503.11",
"201503.13", "201503.14", "201503.15", "201503.17", "201503.18",
"201503.19", "201503.21", "201503.22", "201503.23", "201503.24",
"201503.25", "201503.26", "201503.27", "201503.29", "201503.30",
"201503.31", "201503.33", "201503.34", "201503.35", "201503.9"
), class = "factor"), Cruise_Strata2 = structure(c(23L, 45L,
24L, 46L, 25L, 47L, 26L, 48L, 27L, 49L, 28L, 50L, 29L, 51L, 30L,
52L, 31L, 53L, 32L, 54L, 33L, 55L, 34L, 56L, 35L, 57L, 36L, 58L,
37L, 59L, 38L, 60L, 39L, 61L, 40L, 62L, 41L, 63L, 42L, 64L, 43L,
65L, 44L, 66L, 1L, 45L, 2L, 46L, 3L, 47L, 4L, 48L, 5L, 49L, 6L,
50L, 7L, 51L, 8L, 52L, 9L, 53L, 10L, 54L, 11L, 55L, 12L, 56L,
13L, 57L, 14L, 58L, 15L, 59L, 16L, 60L, 17L, 61L, 18L, 62L, 19L,
63L, 20L, 64L, 21L, 65L, 22L, 66L, 1L, 23L, 2L, 24L, 3L, 25L,
4L, 26L, 5L, 27L, 6L, 28L, 7L, 29L, 8L, 30L, 9L, 31L, 10L, 32L,
11L, 33L, 12L, 34L, 13L, 35L, 14L, 36L, 15L, 37L, 16L, 38L, 17L,
39L, 18L, 40L, 19L, 41L, 20L, 42L, 21L, 43L, 22L, 44L), .Label = c("201501.10",
"201501.11", "201501.13", "201501.14", "201501.15", "201501.17",
"201501.18", "201501.19", "201501.21", "201501.22", "201501.23",
"201501.24", "201501.25", "201501.26", "201501.27", "201501.29",
"201501.30", "201501.31", "201501.33", "201501.34", "201501.35",
"201501.9", "201502.10", "201502.11", "201502.13", "201502.14",
"201502.15", "201502.17", "201502.18", "201502.19", "201502.21",
"201502.22", "201502.23", "201502.24", "201502.25", "201502.26",
"201502.27", "201502.29", "201502.30", "201502.31", "201502.33",
"201502.34", "201502.35", "201502.9", "201503.10", "201503.11",
"201503.13", "201503.14", "201503.15", "201503.17", "201503.18",
"201503.19", "201503.21", "201503.22", "201503.23", "201503.24",
"201503.25", "201503.26", "201503.27", "201503.29", "201503.30",
"201503.31", "201503.33", "201503.34", "201503.35", "201503.9"
), class = "factor"), P_value = c(0.63, 0.6793, 0.0319, 0.0289,
0.9516, 0.8128, 0.9967, 0.3071, 0.9641, 0.0246, 0.7967, 0.2551,
0.2329, 0.3725, 0.0269, 0.3796, 0.0245, 0.5562, 0.9952, 0.5176,
0.5596, 0.9966, 0.32, 0.6402, 0.7691, 0.9671, 0.9396, 0.9, 0.9024,
0.3624, 0.0433, 0.3402, 0.5302, 0.787, 0.0295, 0.3638, 0.006,
0.701, 0.6323, 0.0366, 2e-04, 0.0011, 0.8849, 0.3, 0.63, 0.9738,
0.0319, 0.5197, 0.9516, 0.7369, 0.9967, 0.2276, 0.9641, 0.0158,
0.7967, 0.6332, 0.2329, 0.0322, 0.0269, 0.3013, 0.0245, 0.0129,
0.9952, 0.795, 0.5596, 0.7277, 0.32, 0.747, 0.7691, 0.3817, 0.9396,
0.7961, 0.9024, 0.4164, 0.0433, 0.0028, 0.5302, 0.2864, 0.0295,
0.7036, 0.006, 0, 0.6323, 0.002, 2e-04, 0.9548, 0.8849, 0.0546,
0.6793, 0.9738, 0.0289, 0.5197, 0.8128, 0.7369, 0.3071, 0.2276,
0.0246, 0.0158, 0.2551, 0.6332, 0.3725, 0.0322, 0.3796, 0.3013,
0.5562, 0.0129, 0.5176, 0.795, 0.9966, 0.7277, 0.6402, 0.747,
0.9671, 0.3817, 0.9, 0.7961, 0.3624, 0.4164, 0.3402, 0.0028,
0.787, 0.2864, 0.3638, 0.7036, 0.701, 0, 0.0366, 0.002, 0.0011,
0.9548, 0.3, 0.0546), Cruise1 = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("201501",
"201502", "201503"), class = "factor"), Cruise1_Strata1 = structure(c(1L,
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L,
9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L,
16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L,
22L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L,
8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 20L, 20L,
21L, 21L, 22L, 22L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L,
6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L,
13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L,
20L, 20L, 21L, 21L, 22L, 22L), .Label = c("10", "11", "13", "14",
"15", "17", "18", "19", "21", "22", "23", "24", "25", "26", "27",
"29", "30", "31", "33", "34", "35", "9"), class = "factor"),
Cruise2 = structure(c(2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L,
2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L,
3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L), .Label = c("201501", "201502", "201503"), class = "factor"),
Cruise2_Strata2 = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L,
4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L,
17L, 18L, 18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L, 1L,
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L,
9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L,
15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 20L, 20L,
21L, 21L, 22L, 22L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L,
6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L,
12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L,
18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L), .Label = c("10",
"11", "13", "14", "15", "17", "18", "19", "21", "22", "23",
"24", "25", "26", "27", "29", "30", "31", "33", "34", "35",
"9"), class = "factor"), adjuste_p = c(1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.792, 1, 1, 1, 0.0264,
0.1452, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.3696, 1,
1, 1, 1, 0.792, 0, 1, 0.264, 0.0264, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0.3696, 1, 1, 1, 1, 1, 0, 1, 0.264,
0.1452, 1, 1, 1)), .Names = c("Cruise_Strata1", "Cruise_Strata2",
"P_value", "Cruise1", "Cruise1_Strata1", "Cruise2", "Cruise2_Strata2",
"adjuste_p"), row.names = c(1453L, 2905L, 1520L, 2972L, 1587L,
3039L, 1654L, 3106L, 1721L, 3173L, 1788L, 3240L, 1855L, 3307L,
1922L, 3374L, 1989L, 3441L, 2056L, 3508L, 2123L, 3575L, 2190L,
3642L, 2257L, 3709L, 2324L, 3776L, 2391L, 3843L, 2458L, 3910L,
2525L, 3977L, 2592L, 4044L, 2659L, 4111L, 2726L, 4178L, 2793L,
4245L, 2860L, 4312L, 23L, 2927L, 90L, 2994L, 157L, 3061L, 224L,
3128L, 291L, 3195L, 358L, 3262L, 425L, 3329L, 492L, 3396L, 559L,
3463L, 626L, 3530L, 693L, 3597L, 760L, 3664L, 827L, 3731L, 894L,
3798L, 961L, 3865L, 1028L, 3932L, 1095L, 3999L, 1162L, 4066L,
1229L, 4133L, 1296L, 4200L, 1363L, 4267L, 1430L, 4334L, 45L,
1497L, 112L, 1564L, 179L, 1631L, 246L, 1698L, 313L, 1765L, 380L,
1832L, 447L, 1899L, 514L, 1966L, 581L, 2033L, 648L, 2100L, 715L,
2167L, 782L, 2234L, 849L, 2301L, 916L, 2368L, 983L, 2435L, 1050L,
2502L, 1117L, 2569L, 1184L, 2636L, 1251L, 2703L, 1318L, 2770L,
1385L, 2837L, 1452L, 2904L), class = "data.frame")
R Info
R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
答案 0 :(得分:2)
这能否为您提供所需的结果?
duplicated(apply(cbind(result5$Cruise_Strata1, df$Cruise_Strata2), 1,
function(x) paste(min(x), max(x))))
您可以使用生成的逻辑向量对数据进行子集化。
首先,您创建一个贴在Cruise_Strata1
和Cruise_Strata2
中的值的向量。这样做可以将两者中较小的一个移到前面,将较大的一个移到最后(或者你可以反之亦然)。这只是一个技巧,您可以应用duplicated
功能并识别重复项。
注意:此方法将删除表单的重复项:
Cruise_Strata1 Cruise_Strata2
x y
y x
以及(如果不希望这个让我知道):
Cruise_Strata1 Cruise_Strata2
x y
x y
答案 1 :(得分:0)
对于df
和Cruise_Strata1
中具有重复值的通用数据框Cruise_Strata2
:
df$dupe <- 0
for(i in 1:(length(df$Cruise_Strata1)-1))
{
for(j in (i+1):length(df$Cruise_Strata1))
if(df$Cruise_Strata1[i]==df$Cruise_Strata2[j])
{print(df[c(i,j),]); df$dupe[i] = 1;break}
}
df[df$dupe != 1,]