Question

我有以下文件的列表

a_file.csv another_file.csv a_third_file.csv

我想编写一个函数，该函数将仅paste前的文本_file.csv，这样上面的字符串就可以了。

a another a_third

如何使用stringr来做到这一点？

Answer 1

只是因为您明确要求，所以这里是一个str_extract_all()解决方案。您需要使用所谓的“ positive lookahead”。

library(stringr)

x <- c("a_file.csv", "another_file.csv", "a_third_file.csv")

str_extract_all(x, regex(".*(?=_file.csv)"))
#> [[1]]
#> [1] "a" "" 
#> 
#> [[2]]
#> [1] "another" ""       
#> 
#> [[3]]
#> [1] "a_third" ""

@Joel的答案，即str_split，当然更加简洁，而且速度更快。我在这里使用fixed()，因为我们匹配的是固定字符串而不是正则表达式。

str_split(x, fixed("_file.csv"))
#> [[1]]
#> [1] "a" "" 
#> 
#> [[2]]
#> [1] "another" ""       
#> 
#> [[3]]
#> [1] "a_third" ""

当然，

base R或utils::strsplit()也可以做到这一点，但是请注意，空字符串已消失。

strsplit(x, "_file.csv", fixed = TRUE)
#> [[1]]
#> [1] "a"
#> 
#> [[2]]
#> [1] "another"
#> 
#> [[3]]
#> [1] "a_third"

IMO将单个字符向量作为返回值更加简洁。三种选择：

str_extract()，前瞻性很强。

str_extract(x, regex(".*(?=_file.csv)"))
#> [1] "a"       "another" "a_third"

除了提取所需的字符串外，您还可以替换/删除不需要的字符串。

str_replace(x, fixed("_file.csv"), "")
#> [1] "a"       "another" "a_third"

与base::gsub()相同的策略

gsub("_file.csv", "", x, fixed = TRUE)
#> [1] "a"       "another" "a_third"

Answer 2

您可以使用str_split：

str_split("a_file.csv", "_file.csv")

这将基于模式“ _file.csv”返回片段列表。 Documentation

提取除string中的所有内容

2 个答案: