这是我要清除的数据框的简短示例:
2 4 16 32 64 128 256 512 1024
我正在尝试编写一个通用函数(使用L3 <- LETTERS[1:5]
fac<-c("fish", "meat", "chicken", "veg", "shrimp")
set.seed(1)
(d <- data.frame(code = sample(c(11:15)),
upc = sample(c(1:5)), desc = sample(fac),
desc1 = fac, desc2 = sample(fac),
desc3 = fac, desc4 = sample(fac) ))
code upc desc desc1 desc2 desc3 desc4
1 12 5 meat fish chicken fish shrimp
2 15 4 fish meat shrimp meat fish
3 14 2 chicken chicken veg chicken meat
4 13 3 veg veg fish veg veg
5 11 1 shrimp shrimp meat shrimp chicken
和for loop
),该函数针对每一行分别验证第3列到第7列的条目,并保持在其他列中不重复的唯一值(即:如果一行在所有desc列中都包含鱼,则新行应在一个列中仅包含鱼)。更具体地说,所需的结果是:
unique()
答案 0 :(得分:2)
我们可以使用duplicated
将每行中重复的元素分配给“ desc”列的空白""
nm1 <- grep('desc', names(d))
d[nm1] <- t(apply(d[nm1], 1, function(x) {replace(x, duplicated(x), "")}))
d
# code upc desc desc1 desc2 desc3 desc4
#1 12 5 meat fish chicken shrimp
#2 15 4 fish meat shrimp
#3 14 2 chicken veg meat
#4 13 3 veg fish
#5 11 1 shrimp meat chicken
或使用for
循环(假设列是character
类或在执行分配之前将其列为空白)
for(i in seq_len(nrow(d))) d[i, nm1] <- replace(d[i, nm1],
duplicated(unlist(d[i, nm1])), '')