Suppose, I have df
with two rows of strings that I need to split by space, unlist, then find anti-intersection and reuse in a list. I can do it brute force by working with each row individually. Problem is that there can be more than 2 rows etc. My working solution thus far is below, but there must be a simpler way of not accessing each line. Thanks!!
df = structure(list(A = structure(1:2, .Label = c("R1", "R2"), class = "factor"),
B = c("a b c d e f g o l",
"b h i j k l m n o p q"
)), .Names = c("A", "B"), row.names = c(NA, -2L), class = "data.frame")
dat1 = unlist(strsplit(df[1,2]," "))
dat2 = unlist(strsplit(df[2,2]," "))
f <- function (...)
{
aux <- list(...)
ind <- rep(1:length(aux), sapply(aux, length))
x <- unlist(aux)
boo <- !(duplicated(x) | duplicated(x, fromLast = T))
split(x[boo], ind[boo])
}
excl = (f(dat1, dat2))
L <- list(excl[[1]],excl[[2]])
cfun <- function(L) {
pad.na <- function(x,len) {
c(x,rep("",len-length(x)))
}
maxlen <- max(sapply(L,length))
print(maxlen)
do.call(data.frame,lapply(L,pad.na,len=maxlen))
}
a = cfun(L)
What I had:
A B
1 Food a b c d e f g
2 HABA b h i j k l m n o p q
What I got:
c..a....c....d....e....f....g.......... c..h....i....j....k....m....n....p....q..
1 a h
2 c i
3 d j
4 e k
5 f m
6 g n
7 p
8 q
Edit: The goal is to eliminate common elements from all columns. I.e. if "4" is present in row 1 and seen anywhere else - remove. New test set:
df1 = structure(list(A = structure(1:3, .Label = c("R1", "R2", "R3"
), class = "factor"), B = c("1 4 78 5 4 6 7 0", "2 3 76 8 2 1 8 0",
"4 7 1 2")), .Names = c("A", "B"), row.names = c(NA, -3L), class = "data.frame")
Current output from suggested code:
a b c
1 4 2 4
2 78 3 7
3 5 76 2
4 4 8 NA
5 6 2 NA
6 7 8 NA
7 0 0 NA
2, 4, and 7 should not be there as they are seen in more than 1 column. Bottom line - output should consist of unique numbers/elements only in any columns. Thanks!!