我有一个如下数据框。如果cluster为NA,那么我想在“ To”列中查找“ id”列,并用匹配行的簇值填充New_Col。
id cluster from to
A NA NA NA
B 2 B D
C 5 C A
D NA NA NA
E 5 E B
F NA NA NA
G 3 G F
预期产量
id cluster from to New_Col
A NA NA NA 5
B 2 B D 2
C 5 C A 5
D NA NA NA 2
E 5 E B 5
F NA NA NA 3
G 3 G F 3
答案 0 :(得分:2)
我们可以使用match
:
#Copy cluster value
df$New_col <- df$cluster
#Get NA indices
inds <- is.na(df$New_col)
#Get corresponding cluster values for NA values.
df$New_col[inds] <- with(df, cluster[match(id[inds], to)])
df
# id cluster from to New_col
#1 A NA <NA> <NA> 5
#2 B 2 B D 2
#3 C 5 C A 5
#4 D NA <NA> <NA> 2
#5 E 5 E B 5
#6 F NA <NA> <NA> 3
#7 G 3 G F 3
数据
df <- structure(list(id = structure(1:7, .Label = c("A", "B", "C",
"D", "E", "F", "G"), class = "factor"), cluster = c(NA, 2L, 5L,
NA, 5L, NA, 3L), from = structure(c(NA, 1L, 2L, NA, 3L, NA, 4L
), .Label = c("B", "C", "E", "G"), class = "factor"), to = structure(c(NA,
3L, 1L, NA, 2L, NA, 4L), .Label = c("A", "B", "D", "F"), class = "factor")),
class = "data.frame", row.names = c(NA, -7L))
答案 1 :(得分:1)
使用@Ronak Shah的逻辑(Base R解决方案):
df$new_col <- ifelse(is.na(df$cluster), df$cluster[match(df$id, df$to)], df$cluster)
答案 2 :(得分:0)
这是使用for
循环和which
在列“集群”为NA时,使用
which
查找索引,其中“ to”列与“ id”列匹配
for (i in 1:length(df$cluster)){
df$new_col[i] = ifelse(is.na(df$cluster[i])==T,df$cluster[which(df$to==df$id[i])],df$cluster[i])
}
数据
df <- data.frame(id= c("A", "B", "C","D", "E", "F", "G"),
cluster = c(NA, 2L, 5L,NA, 5L, NA, 3L),
from =c(NA, "B", "C",NA, "E", NA, "G"),
to = c(NA, "D", "A",NA, "B", NA, "F"), stringsAsFactors = F)
输出
df
id cluster from to new_col
1 A NA <NA> <NA> 5
2 B 2 B D 2
3 C 5 C A 5
4 D NA <NA> <NA> 2
5 E 5 E B 5
6 F NA <NA> <NA> 3
7 G 3 G F 3