我试过通过论坛搜索来回答这个问题,但是找不到它。 我想滚动数据框列(IN_FID)的唯一值,并将与该值相关联的另一列(NEAR_FID)的值(可能有一个或多个)添加到列表中。然后IN_FID被添加到列表中。如果在此过程中之前已看到NEAR_FID中的值,则IN_FID不会添加到列表中。我知道我没有把它包含在代码中,但理想情况下我还想在随机而不是顺序循环IN_FID值。 我在这段代码中做错了什么?
eagle
IN_FID NEAR_FID
1 2 1
2 2 2
3 2 3
4 8 4
5 9 2
6 9 7
7 9 8
8 9 9
9 16 2
10 16 11
11 21 12
p.good = list()
p.bad = list()
INFIDS = unique(eagle$IN_FID)
NEARFIDS = unique(eagle$NEAR_FID)
t.used = NEARFIDS
for (i in INFIDS) {
sub = eagle[eagle$IN_FID == i, ]
x = sub$NEAR_FID
if (all(x) %in% t.used){
p.good = c(p.good, i)
t.used[t.used != all(x)]
} else {
p.bad = c(p.bad, i)
}
所需的输出是:
p.good
[1] 2 8 21 (because NEAR_FID of 2 is present in 9 and 16)
p.bad
[1] 9 16
t.used
= empty because it will have used the values during the loop
答案 0 :(得分:1)
您可以使用函数duplicated()
index_dup = which(duplicated(eagle$NEAR_FID))
p.bad = unique(eagle$IN_FID[index_dup])
index_bad = c()
for (i in p.bad){
index_bad = c(index_bad,which(eagle$IN_FID == i))
}
p.good = unique(eagle$IN_FID[-index_bad])
对于随机化,您可以随机输入数据的行顺序,然后再次应用上面的代码
eagle_random <- eagle[sample(1:nrow(eagle)), ]
答案 1 :(得分:0)
而不是列表,声明为vector
:
p.good = NULL
p.bad = NULL
INFIDS = unique(eagle$IN_FID)
NEARFIDS = unique(eagle$NEAR_FID)
t.used = NEARFIDS
而不是min:max
,迭代向量for (i in INFIDS)
的元素:
for (i in INFIDS) {
x = (eagle %>% filter(IN_FID == i))$NEAR_FID # combine into single statement
if (all(x %in% t.used)) { # was all(x) %in% t.used before
p.good = c(p.good, i)
t.used = t.used[!(t.used %in% x)] # was t.used != all(x)
} else {
p.bad = c(p.bad, i)
}
}
输出:
p.good
[1] 2 8 21
p.bad
[1] 9 16
t.used
[1] 7 8 9 11 # some values were not eliminated as you expected
---- 随机抽样 ----
更改for (i in INFIDS)
致for (i in sample(INFIDS))
。使用set.seed(1)
来控制随机抽样。