Question

ID Julian Month Year Location Distance
 2  40749  July 2011     8300    39625
 2  41425   May 2013 Hatchery    31325
 3  40749  July 2011     6950    38625
 3  41057   May 2012 Hatchery    31325
 6  40735  July 2011     8300    39650
12  40743  July 2011    11025    42350

以上是我正在使用的数据框的head()。它包含超过7,000行和3,000个唯一ID值。我想删除只有一个ID值的所有行。这可能吗？也许解决方案只保留ID重复的行？

Answer 1

如果d是您的数据框，我会使用duplicated来查找具有重复ID的行。在fromLast中使用这两个参数可以获得第一个和最后一个重复的ID行。

d[(duplicated(d$ID, fromLast = FALSE) | duplicated(d$ID, fromLast = TRUE)),]

这种双重duplicated方法有多种用途：

Finding ALL duplicate rows, including "elements with smaller subscripts"

How to get a subset of a dataframe which only has elements which appear in the set more than once in R

How to identify "similar" rows in R?

Answer 2

我将如何做到这一点：

new.dataframe <- c()
ids <- unique(dataframe$ID)
for(id in ids){
temp <- dataframe[dataframe$ID == id, ]
if(nrow(temp) > 1){
new.dataframe <- rbind(new.dataframe, temp)
}}

这将删除所有只有一行的ID

仅选择r中具有相同ID的行

2 个答案: