Question

在R中，我有一个名为main的数据，其中有一个名为Pass.id的列，它是特定事件的标识符。此列中的值是唯一的，或者有对

Row  Pass.Id
1      300
2      300
3      301
4      302
5      302
6      303

所以我希望将1,2,4,5行提取到一个新的数据帧中

我花了很多时间在这上面但却无法解决这个问题。任何帮助表示赞赏。

Answer 1

这比它脸上看起来更复杂。我的解决方案有点乱，但是应该可以工作，尽管你可能不得不调整它，因为我没有你的数据输入。我将解释每一件作品，希望这会有所帮助。

第一步是找到带有重复的索引值。

duplicated(main$Pass.Id)
# This is a logical vector with TRUE for the second or later occurrence of some value

由于上述逻辑向量仅为TRUE，仅从第二次出现开始，因此您需要找到实际值：

main$Pass.Id[duplicated(main$Pass.Id)]
# This is a vector of the same class of x that only contains those values that occur more than once

然后在行中找到上面向量中的那些元素并将其提取出来。

main$Pass.Id %in% main$Pass.Id[duplicated(main$Pass.Id)]
# This is a logical vector that is TRUE for each occurrence of any value that is in x more than one time.
# This logical vector is different from the first one, because it includes the first occurence of a duplicate value (not just the second and any later occurrences)

有许多方法可以提取逻辑向量为TRUE的列。在基数R中，您可以：

main[main$Pass.Id %in% main$Pass.Id[duplicated(main$Pass.Id)], ]
# Don't forget the comma, which says you're extracting rows.

使用dplyr，你可以这样做：

filter(main, Pass.Id %in% Pass.Id[duplicated(Pass.Id)])

选择在一列中具有重复值的观察值

1 个答案: