Question

我有一个数据框，其中的一列包含以下格式的条目：

/textIwant
/textIwant/otherstuff
/

我想创建一个新列来提取“ textIwant”。我应该使用strsplit还是regex？

Answer 1

我们可以使用str_extract提取一个或多个非/的字符

library(stringr)
str_extract(str1,  "[^/]+")
#[1] "textIwant"   "textIwant"   "abc-def-ghi" "abc-def-ghi"

或者使用sub中的base R来匹配非/的字符，将其捕获为一个组（([^/]+)）并替换为反向引用（{{ 1}}）

\\1

sub("^.([^/]+).*", "\\1", str1)
#[1] "textIwant"   "textIwant"   "abc-def-ghi" "abc-def-ghi"

Answer 2

我会使用

basename(str1)
[1] "textIwant"   "otherstuff"  "abc-def-ghi" "abc-def-ghi"

str1来自akrun的示例：

str1 <- c("/textIwant", "/textIwant/otherstuff", "/abc-def-ghi/", "/abc-def-ghi")

Answer 3

实际上，可以使用strsplit()上有分隔符的/。

sapply(strsplit(str1, "/"), "[", 2)
# "textIwant"   "textIwant"   "abc-def-ghi" "abc-def-ghi"