Question

我的数字串不一定长度，例如。

0,0,1,2,1,0,0,0

1,1,0,1

2,1,2,0,1,0

我已将这些导入到R中的数据框中，例如以上三个字符串将给出以下三行（我称之为df）：

enter image description here

我希望编写一些能帮助我理解数据的函数。作为一个起点 - 给定一个数字向量x - 我想要一个'进程'P来确定包含x作为子向量的行数，例如如果是x = c(2,1)，那么P(x) = 2，如果是x = c(0,0,0)，那么P(x) = 1，如果是x = c(1,3)那么P(x) = 0。我有更多类似的问题，虽然我希望我能够从这个问题中获取逻辑并自己解决其他一些问题。

Answer 1

编辑：正则表达式的方式是：

match.regex <- function(x,data){
  xs <- paste(x,collapse="_")
  dats <- apply(data,1,paste,collapse="_")
  sum(grepl(xs,dats))
}


> match.regex(c(1),dat)
[1] 3
> match.regex(c(0,0,0),dat)
[1] 1
> match.regex(c(1,2),dat)
[1] 2
> match.regex(5,dat)
[1] 0

令人惊讶的是，这个方法比这里给出的其他方法更快，并且大约是我的解决方案的两倍，无论是小数据集还是大数据集。 Regexes显然已经得到了很好的优化：

> benchmark(matching(c(1,2),dat),match.regex(c(1,2),dat),replications=1000)
                       test replications elapsed relative 
2 match.regex(c(1, 2), dat)         1000    0.15      1.0 
1    matching(c(1, 2), dat)         1000    0.36      2.4

立即为您提供数字并更加向量化的方法如下：

matching.row <- function(x,row){
    nx <- length(x)
    sid <- which(x[1]==row)
    any(sapply(sid,function(i) all(row[seq(i,i+nx-1)]==x)))
}

matching <- function(x,data)
  sum(apply(data,1,function(i) matching.row(x,i)),na.rm=TRUE)

在这里，您首先创建一个带索引的矩阵，该矩阵将窗口移动到与您要匹配的矢量长度相同的行上。然后针对向量检查这些窗口。每行都遵循这种方法，返回TRUE的行的总和就是你想要的。

> matching(c(1),dat)
[1] 3
> matching(c(0,0,0),dat)
[1] 1
> matching(c(1,2),dat)
[1] 2
> matching(5,dat)
[1] 0

Answer 2

您需要apply数据行的apply(dat, MARGIN = 1, FUN = is.sub.array, x = c(2,1))函数：

dat

其中is.sub.array是您的data.frame，x是一个函数，用于检查更大的向量中是否包含is.sub.array（实际上，您是data.frame的行）。 / p>

我不知道有任何可用的is.sub.array <- function(x, y) { j <- rep(TRUE, length(y)) for (i in seq_along(x)) { if (i > 1) j <- c(FALSE, head(j, -1)) j <- j & vapply(y, FUN = function(a,b) isTRUE(all.equal(a, b)), FUN.VALUE = logical(1), b = x[i]) } return(sum(j, na.rm = TRUE) > 0L) }函数，所以这就是我写的方式：

all.equal

（使用numeric的优势在于它可用于比较apply(dat, 1, is.sub.array, x = c(1, 2)) # [1] TRUE FALSE TRUE apply(dat, 1, is.sub.array, x = c(0, 0, 0)) # [1] TRUE FALSE FALSE apply(dat, 1, is.sub.array, x = as.numeric(c(NA, NA))) # [1] FALSE TRUE TRUE向量，这是正则表达式无法做到的。）

以下是一些例子：

all.equal

注意：x对您的数据类型很敏感，因此请务必使用与您的数据类型相同的{{1}}（整数或数字）。

在R中搜索数据框中的行

2 个答案: