在R中,我有一个名为main的数据,其中有一个名为Pass.id的列,它是特定事件的标识符。此列中的值是唯一的,或者有对
Row Pass.Id
1 300
2 300
3 301
4 302
5 302
6 303
所以我希望将1,2,4,5行提取到一个新的数据帧中
我花了很多时间在这上面但却无法解决这个问题。任何帮助表示赞赏。
答案 0 :(得分:0)
这比它脸上看起来更复杂。我的解决方案有点乱,但是应该可以工作,尽管你可能不得不调整它,因为我没有你的数据输入。我将解释每一件作品,希望这会有所帮助。
第一步是找到带有重复的索引值。
duplicated(main$Pass.Id)
# This is a logical vector with TRUE for the second or later occurrence of some value
由于上述逻辑向量仅为TRUE
,仅从第二次出现开始,因此您需要找到实际值:
main$Pass.Id[duplicated(main$Pass.Id)]
# This is a vector of the same class of x that only contains those values that occur more than once
然后在行中找到上面向量中的那些元素并将其提取出来。
main$Pass.Id %in% main$Pass.Id[duplicated(main$Pass.Id)]
# This is a logical vector that is TRUE for each occurrence of any value that is in x more than one time.
# This logical vector is different from the first one, because it includes the first occurence of a duplicate value (not just the second and any later occurrences)
有许多方法可以提取逻辑向量为TRUE的列。在基数R中,您可以:
main[main$Pass.Id %in% main$Pass.Id[duplicated(main$Pass.Id)], ]
# Don't forget the comma, which says you're extracting rows.
使用dplyr,你可以这样做:
filter(main, Pass.Id %in% Pass.Id[duplicated(Pass.Id)])