如何匹配一列中的值并解析R中不同列的匹配值

时间:2015-12-14 01:39:12

标签: r

我真的坚持在R中工作这个问题。我需要grepl(allgood列中的C),如果它存在,我想解析列{{}中的值来自与A列中前面的B匹配的行中的1}}和列allbad并获得结果。

C

结果

A          B            C
apple      ball         allbad-cat
                        allgood-car
dog        bark         allbad-pet
                        bull
                        dull 
                        allgood-pet        

2 个答案:

答案 0 :(得分:2)

# find the index of column "C" starts with "allgood"
good.idx <- which(grepl("^allgood", df$C))
# find the index of column "C" starts with "allbad"
bad.idx <- which(grepl("^allbad", df$C))

# for each "good" index, find the maximum "bad" index smaller than the "good" index
good.bad.near <- sapply(good.idx, function(x){
    return(max(bad.idx[bad.idx<x]))
})

df$A[good.idx] <- df$A[good.bad.near]
df$B[good.idx] <- df$B[good.bad.near]

适用于您的数据。

如果要替换更多列,可以使用for循环。

for (i in 1:2) {
  df[, i][good.idx] <- df[, i][good.bad.near]
}

答案 1 :(得分:2)

我们可以尝试

library(zoo)
i1 <- !grepl('^(allbad|allgood)', df1$C)
df1[1:2] <-  lapply(df1[1:2], function(x) ifelse(i1, '', 
                           na.locf(replace(x, x=='', NA))))
df1
#      A    B     C
#1 apple ball apple
#2 apple ball apple
#3   dog bark   dog
#4                 
#5                 
#6   dog bark   dog

或使用data.table

library(data.table)
setDT(df1)[A=='', c('A', 'B') := NA][, 
           lapply(.SD, na.locf)][i1, c('A', 'B') := ''][]

数据

df1 <- structure(list(A = c("apple", "", "dog", "", "", ""), 
B = c("ball", 
"", "bark", "", "", ""), C = c("allbad-cat", "allgood-car", 
"allbad-pet", 
"bull", "dull", "allgood-pet")), .Names = c("A", "B", "C"), 
 class = "data.frame", row.names = c(NA, -6L))