按行仅保留重复值

时间:2019-06-28 08:49:21

标签: r duplicates

我有一个看起来像这样的数据框:

A = c(4.3, 0.2, 3.7, 1.5, 0.5, 1.6, 2.7)
P = c(4.2, 2.1, 3.0, 2.8, 1.1, 2.3, 3.0)
T1 = c("a", "a1", "e1", "d1", "a3", "f1", "f2") 
T2 = c("a", "b1", "a1", "b2", "a3", "f1", "f3")
T3 = c("c", "c1", "e1", "b2", "k1", "a4", "f3")
T4 = c(NA, "b1", "e1", "b3", "c1", "b3", "f5")
T5 = c(NA, NA, NA, NA, "d6", "a4", "f6")
T6 = c(NA, NA, NA, NA, "f4",  NA, "f7") 
T7 = c(NA, NA, NA, NA, NA, NA, "c1")
T8 = c(NA, NA, NA, NA, NA, NA, "c8")
T9 = c(NA, NA, NA, NA, NA, NA, "f1")
T10= c(NA, NA, NA, NA, NA, NA, "k3")

df1 <- data.frame(A, P, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10)

我想按行删除所有唯一值,而只在每行中保留重复项,所以我想得到这个:

A = c(4.3, 0.2, 3.7, 1.5, 0.5, 1.6, 2.7)
P = c(4.2, 2.1, 3.0, 2.8, 1.1, 2.3, 3.0)
T1 = c("a", NA, "e1", NA, "a3", "f1", NA) 
T2 = c("a", "b1", NA, "b2", "a3", "f1", "f3")
T3 = c(NA, NA, "e1", "b2", NA, "a4", "f3")
T4 = c(NA, "b1", "e1", NA, NA, NA, NA)
T5 = c(NA, NA, NA, NA, NA, "a4", NA)
T6 = c(NA, NA, NA, NA, NA, NA, NA) 
T7 = c(NA, NA, NA, NA, NA, NA, NA)
T8 = c(NA, NA, NA, NA, NA, NA, NA)
T9 = c(NA, NA, NA, NA, NA, NA, NA)
T10= c(NA, NA, NA, NA, NA, NA, NA)

df2 <- data.frame(A, P, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10)

我知道如何做相反的工作,删除所有重复项,因此我尝试更改调用删除非重复项的编码,但是它只保留每个重复项中的一条记录以及“ A”和“ P”的记录列已删除。 然后,我尝试仅对“ T”类型的列运行代码,但随后它甚至没有返回数据帧。这是我的第一个代码:

df2 <- as.data.frame(t(apply(df1, 1, function(x) {x[!duplicated(x)] <- NA; x}))) 

并尝试限制某些列的代码:

df2 <- as.data.frame(t(apply(select_if(df1, grepl("T^[0-9]+$", colnames(df1)==T)), 1, function(x) {x[!duplicated(x)] <- NA; x}))) 

任何建议将不胜感激,谢谢。

1 个答案:

答案 0 :(得分:3)

您还需要指定duplicated(x, fromLast = TRUE)才能获取所有值,即

i1 <- t(apply(df1[-c(1, 2)], 1, function(i)duplicated(i)|duplicated(i, fromLast = TRUE)))
df1[-c(1, 2)][!i1] <- NA
df1
#    A   P   T1   T2   T3   T4   T5   T6   T7   T8   T9  T10
#1 4.3 4.2    a    a <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#2 0.2 2.1 <NA>   b1 <NA>   b1 <NA> <NA> <NA> <NA> <NA> <NA>
#3 3.7 3.0   e1 <NA>   e1   e1 <NA> <NA> <NA> <NA> <NA> <NA>
#4 1.5 2.8 <NA>   b2   b2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#5 0.5 1.1   a3   a3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#6 1.6 2.3   f1   f1   a4 <NA>   a4 <NA> <NA> <NA> <NA> <NA>
#7 2.7 3.0 <NA>   f3   f3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>