我有两个数据框---表A是带有参考名称的模式表,表B是旧名称表。我想对表B进行子集化,使其与表a中的模式匹配,并且当单元格匹配时,用B中的更新列更新B中的新列。
我已经引用了apply regexp in one data frame based on the column in another data frame,但不能解决这种情况。
A <- data.frame(pattern = c("ab", "be|eb", "cc", "dd"),
ref = c("first", "second", "third", "forth"))
B <- data.frame(name = c("aa1", "bb1", "cab", "ccaa" "abed" ,"ddd", "ebba"))
B$new = ""
我希望我的结果表为:
name new
cab first
abed second
ccaa third
ddd forth
ebba second
我正在尝试
for (i in 1:nrow(B)) {
if (as.data.table(unlist(lapply(A$pattern, grepl, B$name))) == TRUE) {
B$new[i] = A$update
}
}
有人知道更好的解决方案吗?我更喜欢使用apply family,但是我不知道如何添加列。任何帮助表示赞赏!
答案 0 :(得分:1)
我编辑了答案,因为我忘记先添加将B更改为矩阵的行:
B <- as.matrix(B,ncol=1)
它现在应该可以正常工作:
library(reshape2)
L <- apply(A, 1, function(x) B[grepl(x[1],B),])
names(L) <- A$ref
result <- melt(L)
colnames(result) <- c('Name','New')
result
# Name New
#1 cab first
#2 abed first
#3 abed second
#4 ebba second
#5 ccaa third
#6 ddd forth
答案 1 :(得分:1)
您可以将stack
与sapply一起使用:
stack(setNames(sapply(A$pattern,grep,B$name,value=T),A$ref))
values ind
1 cab first
2 abed first
3 abed second
4 ebba second
5 ccaa third
6 ddd forth
您还可以使用stack(setNames(Vectorize(grep)(A$pattern,B[1],value=T),A$ref))
答案 2 :(得分:0)
# Your data
A <- data.frame(pattern = c("ab", "be|eb", "cc", "dd"),
ref = c("first", "second", "third", "fourth"), stringsAsFactors = F)
B <- data.frame(name = c("aa1", "bb1", "cab", "ccaa", "abed" ,"ddd", "ebba"), stringsAsFactors = F)
patternfind <- function(i){
ifelse(grepl(A$pattern[[i]], B$name), A$ref[[i]], NA)
} # grepl function for your apply
m = sapply(seq_along(A$pattern), patternfind) # apply function
test <- cbind(B,m) #bind your pattern matrix to B
melt(test, id = c("name"), value.name = "new", na.rm = T) # melt data for output
name variable new
3 cab 1 first
5 abed 1 first
12 abed 2 second
14 ebba 2 second
18 ccaa 3 third
27 ddd 4 fourth
如果您想走data.table
路线。
library(data.table)
DT.A <- as.data.table(A) # set as data tables
DT.B <- as.data.table(B)
ab <- DT.A[, DT.B[grep(pattern, name)], by=.(pattern, new = ref)] # use grep and by, leave out pattern if don't need to see what matched
ab[,c(3,2,1)] # reorder to your desired order
ab[,1:2] # subset to remove the pattern if you decide you don't want to display it
name new
1: cab first
2: abed first
3: abed second
4: ebba second
5: ccaa third
6: ddd fourth