Question

我想使用gregexpr函数查找字符串中子字符串的开始和结束位置。该功能在控制台中工作正常，但是我无法访问起始位置或字符串长度的结果：

g <- gregexpr("e", "cheese")

g

[[1]]
[1] 3 4 6
attr(,"match.length")
[1] 1 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

g[[1]][1]仅显示第一个值（3），但是我需要创建一个包含所有起始位置和长度值的向量。谢谢。

Answer 1

您可以使用取消列表，您将收到职位列表。一旦只需要第一个和最后一个，就可以使用min和max

unlist(g)

[1] 3 4 6

Answer 2

您可以通过以下方式提取它们：

g <- gregexpr("e", "cheese")

# one liner for : starts <- g[[1]]
#                 attributes(starts) <- NULL
starts <- `attributes<-`(g[[1]],NULL) 

lens <- attr(g[[1]],'match.length')

> starts
[1] 3 4 6
> lens
[1] 1 1 1

当然，这仅在文本长度为1时才有效（如示例中那样，因为它仅包含"cheese"）。否则，您将需要使用g，g[[2]] ...等来迭代g[[3]]的元素。

Answer 3

另一种方法是：

g <- gregexpr("e", "cheese")

g[[1]][1:length(g[[1]])]
#[1] 3 4 6

以及使用unlist方法进行的微基准测试：

microbenchmark::microbenchmark(
   g[[1]][1:length(g[[1]])], 
   unlist(g)
)

#Unit: nanoseconds
#                     expr min  lq   mean median  uq   max neval
# g[[1]][1:length(g[[1]])] 378 378 653.80    379 756  8307   100
#                unlist(g)   0 378 544.32    378 378 15104   100

访问gregexpr的结果

3 个答案: