我想对照4个列表(a
,b
,c
,d
)(在数据框的一列中)检查一个单词:>
if df$word is in a then df$code <- 1
if df$word is in b then df$code <- 2
if df$word is in c then df$code <- 3
if df$word is in d then df$code <- 4
if df$word is in a & b then df$code <- 1 2
if df$word is in a & c then df$code <- 1 3
if df$word is in a & d then df$code <- 1 4
if df$word is in b & c then df$code <- 2 3
if df$word is in b & d then df$code <- 2 4
if df$word is in c & d then df$code <- 3 4
等
最有效的方法是什么?
示例
df <- data.frame(word = c("book", "worm", "digital", "context"))
a <- c("book", "context")
b <- c("book", "worm", "context")
c <- c("digital", "worm", "context")
d <- c("context")
预期输出:
book 1 2
worm 2 3
digital 3
context 1 2 3 4
答案 0 :(得分:2)
我们可以使用双sapply
循环,其中对于数据帧中的每个元素,我们都检出which
列表元素,并获得相应的列表编号。
lst <- list(a, b, c, d)
df$output <- sapply(df$V1, function(x) paste0(which(sapply(lst,
function(y) any(grepl(x,y)))), collapse = ","))
df
# V1 output
#1 book 1,2
#2 worm 2,3
#3 digital 3
#4 context 1,2,3,4
数据
df <- read.table(text = "book
worm
digital
context")
答案 1 :(得分:1)
尝试一下:
df <- data.frame(x =c("book", "worm","digital", "context"))
a <- c("book", "context")
b<- c("book", "worm", "context")
c <- c("digital", "worm", "context")
d <- c("context")
anno <- function(x){
rslt = ""
if (x %in% a) rslt =paste0(rslt," 1")
if (x %in% b) rslt =paste0(rslt," 2")
if (x %in% c) rslt =paste0(rslt," 3")
if (x %in% d) rslt =paste0(rslt," 4")
return(stringr::str_trim(rslt))
}
df$code <- sapply(df$x, anno)
df
#> x code
#> 1 book 1 2
#> 2 worm 2 3
#> 3 digital 3
#> 4 context 1 2 3 4
由reprex package(v0.2.0.9000)创建于2018-08-17。
答案 2 :(得分:1)
这也可以分两个步骤完成:
df
使用data.table
:
library(data.table)
long <-setDT(melt(list(a, b, c, d), value.name = "word"))
long[setDT(df), on = "word", by = .EACHI, .(code = toString(L1))][]
word code 1: book 1, 2 2: worm 2, 3 3: digital 3 4: context 1, 2, 3, 4