恰好匹配一次出现而不是连续出现

时间:2018-03-02 05:50:48

标签: r regex

我有一个文件名,其中包含从list.files(..., full.names = T)返回的目录路径。我想将文件名拆分为/以查找目录结构。我只能识别/的单次出现,例如

strsplit("C://dir1/dir2/txt.R", "/")
# [[1]]
# [1] "C:"    ""      "dir1"  "dir2"  "txt.R"

当我希望输出为:

[1] "C://"  "dir1"  "dir2"  "txt.R"

我看this answer似乎给出了一个正则表达式的答案,然而,当我试图得到一个'字面值时,我收到一个错误。匹配:

> strsplit("C://dir1/dir2/txt.R", "\/")
Error: '\/' is an unrecognized escape in character string starting ""\/"

事实上,该示例中的正则表达式不适用于R

> grepl('([\w\/]+)\/amp(\/\w+[-\/]\w+\/?)', '/name/amp/test-123')
Error: '\w' is an unrecognized escape in character string starting "'([\w"

4 个答案:

答案 0 :(得分:2)

一个选项是匹配/SKIP的多个出现,同时分割单个// <之后成功的单词边界/ p>

strsplit("C://dir1/dir2/txt.R", "[/]{2,}(*SKIP)(*F)|\\b[/]|(?<=[/])\\b", perl = TRUE)[[1]]
#[1] "C://"  "dir1"  "dir2"  "txt.R"

答案 1 :(得分:2)

试试这段代码:

strsplit("C://dir1/dir2/txt.R", "(?<=//)|(?<!/)/(?!/)", perl=TRUE)

See output here

<强>解释

  • (?<=//) - 找到紧跟//
  • 之前的位置
  • | - 或
  • (?<!/)/(?!/) - 匹配//前面没有/,后面跟<div class="container text-center my-3"> <h2>Bootstrap 4 Multiple Item Carousel</h2> <div class="row mx-auto my-auto"> <div id="recipeCarousel" class="carousel slide w-100" data-ride="carousel"> <div class="carousel-inner w-100" role="listbox"> <div class="carousel-item active"> <a href="#" class="d-block col-3"><img class="img-fluid" src="http://placehold.it/350x180?text=1"></a> </div> <div class="carousel-item"> <a href="#" class="d-block col-3"><img class="img-fluid" src="http://placehold.it/350x180?text=2"></a> </div> <div class="carousel-item"> <a href="#" class="d-block col-3"><img class="img-fluid" src="http://placehold.it/350x180?text=3"></a> </div> <div class="carousel-item"> <a href="#" class="d-block col-3"><img class="img-fluid" src="http://placehold.it/350x180?text=4"></a> </div> <div class="carousel-item"> <a href="#" class="d-block col-3"><img class="img-fluid" src="http://placehold.it/350x180?text=5"></a> </div> <div class="carousel-item"> <a href="#" class="d-block col-3"><img class="img-fluid" src="http://placehold.it/350x180?text=6"></a> </div> </div> <a class="carousel-control-prev" href="#recipeCarousel" role="button" data-slide="prev"> <span class="carousel-control-prev-icon" aria-hidden="true"></span> <span class="sr-only">Previous</span> </a> <a class="carousel-control-next" href="#recipeCarousel" role="button" data-slide="next"> <span class="carousel-control-next-icon" aria-hidden="true"></span> <span class="sr-only">Next</span> </a> </div> </div> <h4>Advances one slide at a time</h4> </div>

Regex Demo

答案 2 :(得分:2)

KISS,

strsplit("C://dir1/dir2/txt.R", "\\b/\\b|(?<=//)", perl = TRUE)[[1]]
# [1] "C://"  "dir1"  "dir2"  "txt.R"

答案 3 :(得分:2)

一种非常简单的匹配方法是

x <- "C://dir1/dir2/txt.R"
regmatches(x, gregexpr("[^/]+(?://)?", x))
#  or with stringr
str_extract_all(x, "[^/]+(?://)?")
# [[1]]
# [1] "C://"  "dir1"  "dir2"  "txt.R"

请参阅regex demoR online demo

模式详情

  • [^/]+ - 除/
  • 以外的1个或多个字符
  • (?://)? - 两个/的可选序列。

注意如果您想忽略路径中的//并且只在开头抓取它们,您可以添加替代方法,例如^[[:alpha:]]://或lookbehind {{ 1}}到可选组:

(?<=^[[:alpha:]]:)

请参阅thisthat regex demo