处理数据框值以匹配不同的列

时间:2020-03-22 05:22:20

标签: r dataframe dplyr

我有一个如下数据框。如果cluster为NA,那么我想在“ To”列中查找“ id”列,并用匹配行的簇值填充New_Col。

id  cluster from    to
A   NA      NA      NA
B   2       B       D
C   5       C       A
D   NA      NA      NA
E   5       E       B
F   NA      NA      NA
G   3       G       F

预期产量

id  cluster from    to  New_Col
A   NA      NA      NA  5
B   2       B       D   2
C   5       C       A   5
D   NA      NA      NA  2
E   5       E       B   5
F   NA      NA      NA  3
G   3       G       F   3

3 个答案:

答案 0 :(得分:2)

我们可以使用match

#Copy cluster value
df$New_col <- df$cluster
#Get NA indices
inds <- is.na(df$New_col)
#Get corresponding cluster values for NA values.
df$New_col[inds] <- with(df, cluster[match(id[inds], to)])
df

#  id cluster from   to New_col
#1  A      NA <NA> <NA>       5
#2  B       2    B    D       2
#3  C       5    C    A       5
#4  D      NA <NA> <NA>       2
#5  E       5    E    B       5
#6  F      NA <NA> <NA>       3
#7  G       3    G    F       3

数据

df <- structure(list(id = structure(1:7, .Label = c("A", "B", "C", 
"D", "E", "F", "G"), class = "factor"), cluster = c(NA, 2L, 5L, 
NA, 5L, NA, 3L), from = structure(c(NA, 1L, 2L, NA, 3L, NA, 4L
), .Label = c("B", "C", "E", "G"), class = "factor"), to = structure(c(NA, 
3L, 1L, NA, 2L, NA, 4L), .Label = c("A", "B", "D", "F"), class = "factor")), 
class = "data.frame", row.names = c(NA, -7L))

答案 1 :(得分:1)

使用@Ronak Shah的逻辑(Base R解决方案):

df$new_col <-  ifelse(is.na(df$cluster), df$cluster[match(df$id, df$to)], df$cluster)

答案 2 :(得分:0)

这是使用for循环和which

的另一种选择

在列“集群”为NA时,使用which查找索引,其中“ to”列与“ id”列匹配

for (i in 1:length(df$cluster)){
      df$new_col[i] =  ifelse(is.na(df$cluster[i])==T,df$cluster[which(df$to==df$id[i])],df$cluster[i])
    }

数据

df <- data.frame(id= c("A", "B", "C","D", "E", "F", "G"),
                 cluster = c(NA, 2L, 5L,NA, 5L, NA, 3L),
                 from =c(NA, "B", "C",NA, "E", NA, "G"),
                 to = c(NA, "D", "A",NA, "B", NA, "F"), stringsAsFactors = F)

输出

df
  id cluster from   to new_col
1  A      NA <NA> <NA>       5
2  B       2    B    D       2
3  C       5    C    A       5
4  D      NA <NA> <NA>       2
5  E       5    E    B       5
6  F      NA <NA> <NA>       3
7  G       3    G    F       3