删除特定行的列之间的重复观测值

时间:2019-01-03 18:56:50

标签: r dataframe for-loop unique

这是我要清除的数据框的简短示例:

2 4 16 32 64 128 256 512 1024

我正在尝试编写一个通用函数(使用L3 <- LETTERS[1:5] fac<-c("fish", "meat", "chicken", "veg", "shrimp") set.seed(1) (d <- data.frame(code = sample(c(11:15)), upc = sample(c(1:5)), desc = sample(fac), desc1 = fac, desc2 = sample(fac), desc3 = fac, desc4 = sample(fac) )) code upc desc desc1 desc2 desc3 desc4 1 12 5 meat fish chicken fish shrimp 2 15 4 fish meat shrimp meat fish 3 14 2 chicken chicken veg chicken meat 4 13 3 veg veg fish veg veg 5 11 1 shrimp shrimp meat shrimp chicken for loop),该函数针对每一行分别验证第3列到第7列的条目,并保持在其他列中不重复的唯一值(即:如果一行在所有desc列中都包含鱼,则新行应在一个列中仅包含鱼)。更具体地说,所需的结果是:

unique()
  

1 个答案:

答案 0 :(得分:2)

我们可以使用duplicated将每行中重复的元素分配给“ desc”列的空白""

nm1 <- grep('desc', names(d))
d[nm1] <- t(apply(d[nm1], 1, function(x) {replace(x, duplicated(x), "")}))
d
#  code upc    desc desc1   desc2 desc3   desc4
#1   12   5    meat  fish chicken        shrimp
#2   15   4    fish  meat  shrimp              
#3   14   2 chicken           veg          meat
#4   13   3     veg          fish              
#5   11   1  shrimp          meat       chicken

或使用for循环(假设列是character类或在执行分配之前将其列为空白)

for(i in seq_len(nrow(d))) d[i, nm1] <- replace(d[i, nm1], 
                                     duplicated(unlist(d[i, nm1])), '')