Question

我有一个包含5列和许多行的数据框，它们只重复前3列的元素（简而言之，它是由多个卷构建的卷，因此有相同的坐标（x，y， z）使用不同的标签，我想消除重复的坐标。）

如何使用R命令消除这些？

由于 AV

Answer 1

您可以使用duplicated功能，例如：

# create an example data.frame
Lab1<-letters[1:10]
Lab2<-LETTERS[1:10]
x <- c(3,4,3,3,4,2,4,3,9,0)
y <- c(3,4,3,5,4,2,1,5,7,2)
z <- c(8,7,8,8,4,3,1,8,6,3)
DF <- data.frame(Lab1,Lab2,x,y,z)

> DF
   Lab1 Lab2 x y z
1     a    A 3 3 8
2     b    B 4 4 7
3     c    C 3 3 8
4     d    D 3 5 8
5     e    E 4 4 4
6     f    F 2 2 3
7     g    G 4 1 1
8     h    H 3 5 8
9     i    I 9 7 6
10    j    J 0 2 3

# remove rows having repeated x,y,z 
DF2 <- DF[!duplicated(DF[,c('x','y','z')]),]

> DF2
   Lab1 Lab2 x y z
1     a    A 3 3 8
2     b    B 4 4 7
4     d    D 3 5 8
5     e    E 4 4 4
6     f    F 2 2 3
7     g    G 4 1 1
9     i    I 9 7 6
10    j    J 0 2 3

编辑：

要允许在具有相同坐标的行中进行选择，您可以使用例如by函数（即使效率低于之前的方法）：

res <- by(DF,
      INDICES=paste(DF$x,DF$y,DF$z,sep='|'),
      FUN=function(equalRows){
             # equalRows is a data.frame with the rows having the same x,y,z
             # for exampel here we choose the first row ordering by Lab1 then Lab2
             row <- equalRows[order(equalRows$Lab1,equalRows$Lab2),][1,]
             return(row)
      })

DF2 <- do.call(rbind.data.frame,res)
> DF2
      Lab1 Lab2 x y z
0|2|3    j    J 0 2 3
2|2|3    f    F 2 2 3
3|3|8    a    A 3 3 8
3|5|8    d    D 3 5 8
4|1|1    g    G 4 1 1
4|4|4    e    E 4 4 4
4|4|7    b    B 4 4 7
9|7|6    i    I 9 7 6

如何删除5个相同元素中的3个的数据帧的行？

1 个答案: