Question

当数据集包含两个具有重复值的列时，如何在数据集中仅保留一个观察值？例如，如果这是我的数据集：

row1 & row 2 
col(Sepal.Length) and col(Petal.Length)

包含类似的值(5.1, 1.4)，(5.1, 1.4)

我想删除第二行并保留第一行。

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          5.1         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          5.0         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

可重复的测试数据：

test12 <- head(iris)
test12[2,1] <- 5.1

提前致谢。

Answer 1

使用duplicated比较这些特定列：

test12[!duplicated(test12[,c(1,3)]),]
## or referencing the column names themselves:
test12[!duplicated(test12[,c("Sepal.Length","Petal.Length")]),]

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          5.0         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

Answer 2

仅保留第一行：

    row1 <- test12[1, ]

删除dataFrame的第二行：

    dropRow <- test12[-2, ]

根据重复值对观察值进行子集

2 个答案: