在数据帧之间应用正则表达式和更新列

时间:2018-09-17 19:42:42

标签: r for-loop apply lapply

我有两个数据框---表A是带有参考名称的模式表,表B是旧名称表。我想对表B进行子集化,使其与表a中的模式匹配,并且当单元格匹配时,用B中的更新列更新B中的新列。

我已经引用了apply regexp in one data frame based on the column in another data frame,但不能解决这种情况。

A <- data.frame(pattern = c("ab", "be|eb", "cc", "dd"), 
                ref = c("first", "second", "third", "forth"))
B <- data.frame(name = c("aa1", "bb1", "cab", "ccaa" "abed" ,"ddd", "ebba"))
B$new = ""

我希望我的结果表为:

name       new
cab        first
abed       second
ccaa       third
ddd        forth
ebba       second

我正在尝试

for (i in 1:nrow(B)) {
  if (as.data.table(unlist(lapply(A$pattern, grepl, B$name))) == TRUE) {
    B$new[i] = A$update
  }
}

有人知道更好的解决方案吗?我更喜欢使用apply family,但是我不知道如何添加列。任何帮助表示赞赏!

3 个答案:

答案 0 :(得分:1)

我编辑了答案,因为我忘记先添加将B更改为矩阵的行:

B <- as.matrix(B,ncol=1) 

它现在应该可以正常工作:

library(reshape2)
L <- apply(A, 1, function(x) B[grepl(x[1],B),])
names(L) <- A$ref
result <- melt(L)
colnames(result) <- c('Name','New')

    result
#  Name    New
#1  cab  first
#2 abed  first
#3 abed second
#4 ebba second
#5 ccaa  third
#6  ddd  forth

答案 1 :(得分:1)

您可以将stack与sapply一起使用:

stack(setNames(sapply(A$pattern,grep,B$name,value=T),A$ref))

  values    ind
1    cab  first
2   abed  first
3   abed second
4   ebba second
5   ccaa  third
6    ddd  forth

您还可以使用stack(setNames(Vectorize(grep)(A$pattern,B[1],value=T),A$ref))

答案 2 :(得分:0)

# Your data
A <- data.frame(pattern = c("ab", "be|eb", "cc", "dd"), 
            ref = c("first", "second", "third", "fourth"), stringsAsFactors = F)
B <- data.frame(name = c("aa1", "bb1", "cab", "ccaa", "abed" ,"ddd", "ebba"), stringsAsFactors = F)

patternfind <- function(i){
  ifelse(grepl(A$pattern[[i]], B$name), A$ref[[i]], NA) 
} # grepl function for your apply

m = sapply(seq_along(A$pattern), patternfind) # apply function 

test <- cbind(B,m) #bind your pattern matrix to B
melt(test, id = c("name"), value.name = "new", na.rm = T) # melt data for output

   name variable    new
3   cab        1  first
5  abed        1  first
12 abed        2 second
14 ebba        2 second
18 ccaa        3  third
27  ddd        4  fourth

如果您想走data.table路线。

library(data.table)

DT.A <- as.data.table(A) # set as data tables
DT.B <- as.data.table(B)

ab <- DT.A[, DT.B[grep(pattern, name)], by=.(pattern, new = ref)] # use grep and by, leave out pattern if don't need to see what matched
ab[,c(3,2,1)] # reorder to your desired order
ab[,1:2] # subset to remove the pattern if you decide you don't want to display it

   name    new
1:  cab  first
2: abed  first
3: abed second
4: ebba second
5: ccaa  third
6:  ddd  fourth