在R中搜索数据帧中的特定字符集

时间:2018-01-03 18:17:13

标签: r dataframe

我创建了一组包含一些缺失值的字符,例如

bp <- rep(NA, 5)
bp[c(2,4)] <- c("sugar","milk")
bp

> bp
[1] NA  "sugar" NA  "milk" NA 

我正在寻找一种方法来使用 bp 来搜索更大的数据框,以便找到 bp (以及在哪里),但填充了NA。

例如,

[1] any1  "sugar" any2  "milk" any3 
[2] any2  "sugar" any5  "milk" any1 
[3] any6  "sugar" any1  "milk" any3 
[4] any8  "sugar" any7  "milk" any6
[5] any1  "sugar" any2  "milk" any3 

编辑:数据框的一部分看起来像这样

c("milk", "sugar", "sugar", "creme", "carw", "milk", "creme", "carw", 
"sugar", "carw", "creme", "sugar", "sugar", "milk", "milk", "creme", 
"sugar", "sugar", "carw", "carw", "carw", "milk", "sugar", "sugar", 
"carw", "sugar", "milk", "sugar", "creme", "carw", "carw", "carw", 
"creme", "carw", "carw", "creme", "creme", "milk", "carw", "milk", 
"milk", "creme", "creme", "creme", "milk", "milk", "creme", "carw", 
"carw", "milk", "milk", "creme", "creme", "carw", "carw", "milk", 
"sugar", "carw", "milk", "carw", "creme", "sugar", "sugar", "creme", 
"sugar", "sugar", "creme", "sugar", "carw", "sugar", "carw", 
"carw", "creme", "sugar", "milk", "milk", "carw", "carw", "milk", 
"creme", "sugar", "carw", "milk", "sugar", "sugar", "milk", "sugar", 
"creme", "milk", "milk", "carw", "milk", "sugar", "carw", "sugar", 
"carw", "creme", "creme", "carw", "milk", "milk", "milk", "milk", 
"carw", "carw", "milk", "milk", "carw", "sugar", "milk", "milk", 
"milk", "creme", "carw", "creme", "milk", "milk", "milk", "creme", 
"carw", "milk", "carw", "carw", "carw", "carw", "carw", "carw"
)

我会使用它来搜索整个数据框,但在这种情况下它很棘手。

library(data.table)

n1 <- length(bp)
bp.pos <- setDT(data.frame)[,  which(Reduce(`&`, Map(`==`, shift(value1, seq(n1)-1, 
                                                                             type = "lead"), 
                                                                 bp)))]

任何帮助都将不胜感激。

1 个答案:

答案 0 :(得分:1)

这是基于我对您的问题的理解。我调用你分享的矢量x

test = sapply(seq_along(bp), function(i) bp[i] == x[(0 + i):(length(x) - length(bp) + i)])
test = test | is.na(test)
res = which(apply(test, 1, all))
res = lapply(res, function(x) x + seq_along(bp) - 1)
final = lapply(res, function(z) x[z])
names(final) = lapply(res, "[", 1)

# $`11`
# [1] "creme" "sugar" "sugar" "milk"  "milk" 
# 
# $`12`
# [1] "sugar" "sugar" "milk"  "milk"  "creme"
# 
# $`56`
# [1] "milk"  "sugar" "carw"  "milk"  "carw" 
# 
# $`73`
# [1] "creme" "sugar" "milk"  "milk"  "carw" 
# 
# $`80`
# [1] "creme" "sugar" "carw"  "milk"  "sugar"
# 
# $`83`
# [1] "milk"  "sugar" "sugar" "milk"  "sugar"
# 
# $`86`
# [1] "milk"  "sugar" "creme" "milk"  "milk" 
# 
# $`108`
# [1] "carw"  "sugar" "milk"  "milk"  "milk" 

结果是一个命名列表,其中名称是x的起始索引,值是匹配的向量。这为您提供了“where”以及一个对象中的匹配。