我有一个看起来像这样的数据框:
A = c(4.3, 0.2, 3.7, 1.5, 0.5, 1.6, 2.7)
P = c(4.2, 2.1, 3.0, 2.8, 1.1, 2.3, 3.0)
T1 = c("a", "a1", "e1", "d1", "a3", "f1", "f2")
T2 = c("a", "b1", "a1", "b2", "a3", "f1", "f3")
T3 = c("c", "c1", "e1", "b2", "k1", "a4", "f3")
T4 = c(NA, "b1", "e1", "b3", "c1", "b3", "f5")
T5 = c(NA, NA, NA, NA, "d6", "a4", "f6")
T6 = c(NA, NA, NA, NA, "f4", NA, "f7")
T7 = c(NA, NA, NA, NA, NA, NA, "c1")
T8 = c(NA, NA, NA, NA, NA, NA, "c8")
T9 = c(NA, NA, NA, NA, NA, NA, "f1")
T10= c(NA, NA, NA, NA, NA, NA, "k3")
df1 <- data.frame(A, P, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10)
我想按行删除所有唯一值,而只在每行中保留重复项,所以我想得到这个:
A = c(4.3, 0.2, 3.7, 1.5, 0.5, 1.6, 2.7)
P = c(4.2, 2.1, 3.0, 2.8, 1.1, 2.3, 3.0)
T1 = c("a", NA, "e1", NA, "a3", "f1", NA)
T2 = c("a", "b1", NA, "b2", "a3", "f1", "f3")
T3 = c(NA, NA, "e1", "b2", NA, "a4", "f3")
T4 = c(NA, "b1", "e1", NA, NA, NA, NA)
T5 = c(NA, NA, NA, NA, NA, "a4", NA)
T6 = c(NA, NA, NA, NA, NA, NA, NA)
T7 = c(NA, NA, NA, NA, NA, NA, NA)
T8 = c(NA, NA, NA, NA, NA, NA, NA)
T9 = c(NA, NA, NA, NA, NA, NA, NA)
T10= c(NA, NA, NA, NA, NA, NA, NA)
df2 <- data.frame(A, P, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10)
我知道如何做相反的工作,删除所有重复项,因此我尝试更改调用删除非重复项的编码,但是它只保留每个重复项中的一条记录以及“ A”和“ P”的记录列已删除。 然后,我尝试仅对“ T”类型的列运行代码,但随后它甚至没有返回数据帧。这是我的第一个代码:
df2 <- as.data.frame(t(apply(df1, 1, function(x) {x[!duplicated(x)] <- NA; x})))
并尝试限制某些列的代码:
df2 <- as.data.frame(t(apply(select_if(df1, grepl("T^[0-9]+$", colnames(df1)==T)), 1, function(x) {x[!duplicated(x)] <- NA; x})))
任何建议将不胜感激,谢谢。
答案 0 :(得分:3)
您还需要指定duplicated(x, fromLast = TRUE)
才能获取所有值,即
i1 <- t(apply(df1[-c(1, 2)], 1, function(i)duplicated(i)|duplicated(i, fromLast = TRUE)))
df1[-c(1, 2)][!i1] <- NA
df1
# A P T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
#1 4.3 4.2 a a <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#2 0.2 2.1 <NA> b1 <NA> b1 <NA> <NA> <NA> <NA> <NA> <NA>
#3 3.7 3.0 e1 <NA> e1 e1 <NA> <NA> <NA> <NA> <NA> <NA>
#4 1.5 2.8 <NA> b2 b2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#5 0.5 1.1 a3 a3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#6 1.6 2.3 f1 f1 a4 <NA> a4 <NA> <NA> <NA> <NA> <NA>
#7 2.7 3.0 <NA> f3 f3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>