Question

我试过通过论坛搜索来回答这个问题，但是找不到它。我想滚动数据框列（IN_FID）的唯一值，并将与该值相关联的另一列（NEAR_FID）的值（可能有一个或多个）添加到列表中。然后IN_FID被添加到列表中。如果在此过程中之前已看到NEAR_FID中的值，则IN_FID不会添加到列表中。我知道我没有把它包含在代码中，但理想情况下我还想在随机而不是顺序循环IN_FID值。我在这段代码中做错了什么？

eagle
   IN_FID NEAR_FID
1       2        1
2       2        2
3       2        3
4       8        4
5       9        2
6       9        7
7       9        8
8       9        9
9      16        2
10     16       11
11     21       12

p.good = list()
p.bad = list()
INFIDS = unique(eagle$IN_FID)
NEARFIDS = unique(eagle$NEAR_FID)
t.used = NEARFIDS

for (i in INFIDS) {
sub = eagle[eagle$IN_FID == i, ]
x = sub$NEAR_FID
if (all(x) %in% t.used){
    p.good = c(p.good, i)
    t.used[t.used != all(x)]

} else { 
    p.bad = c(p.bad, i)
}

所需的输出是：

p.good
[1] 2 8 21  (because NEAR_FID of 2 is present in 9 and 16)
p.bad
[1] 9 16
t.used
= empty because it will have used the values during the loop

Answer 1

您可以使用函数duplicated()

index_dup = which(duplicated(eagle$NEAR_FID))

p.bad = unique(eagle$IN_FID[index_dup])

index_bad = c()
for (i in p.bad){
  index_bad = c(index_bad,which(eagle$IN_FID == i))
}

p.good = unique(eagle$IN_FID[-index_bad])

对于随机化，您可以随机输入数据的行顺序，然后再次应用上面的代码

eagle_random <- eagle[sample(1:nrow(eagle)), ]

Answer 2

而不是列表，声明为vector：

p.good = NULL
p.bad = NULL

INFIDS = unique(eagle$IN_FID)
NEARFIDS = unique(eagle$NEAR_FID)
t.used = NEARFIDS

而不是min:max，迭代向量for (i in INFIDS)的元素：

for (i in INFIDS) {
     x = (eagle %>% filter(IN_FID == i))$NEAR_FID   # combine into single statement
     if (all(x %in% t.used)) {    # was all(x) %in% t.used before
        p.good = c(p.good, i)
        t.used = t.used[!(t.used %in% x)]  # was t.used != all(x)
    } else {
        p.bad = c(p.bad, i)  
    }
}

输出：

p.good
[1] 2  8 21

p.bad
[1] 9 16

t.used
[1] 7  8  9 11    # some values were not eliminated as you expected

---- 随机抽样 ----

更改for (i in INFIDS)

致for (i in sample(INFIDS))。使用set.seed(1)来控制随机抽样。

选择循环中的元素R

2 个答案: