Question

我有一个数据集说

x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')

我想删除最后一个斜线之前的所有内容，结果应如下所示

my cat is handsome

我用Google搜索了这个代码，它在最后一个斜杠之前提供了所有内容

gsub('(.*)/\\w+', '\\1', x)
[1] "test/test" "et/tom"    "set/eat"   "sk / tie"

如何更改此代码，以便显示最后一个斜杠后字符串的其他部分？

谢谢

Answer 1

您可以使用basename：

paste(trimws(basename(x)),collapse=" ")
# [1] "my cat is handsome"

Answer 2

使用strsplit

> sapply(strsplit(x, "/\\s*"), tail, 1)
   [1] "my"       "cat"      "is"       "handsome"

gsub

的另一种方式

> gsub("(.*/\\s*(.*$))", "\\2", x) # without 'unwanted' spaces
[1] "my"       "cat"      "is"       "handsome"

使用str_extract包

中的stringr

> library(stringr)
> str_extract(x, "\\w+$") # without 'unwanted' spaces
[1] "my"       "cat"      "is"       "handsome"

Answer 3

基本上你可以移动括号在你已经找到的正则表达式中的位置：

gsub('.*/ ?(\\w+)', '\\1', x)

Answer 4

您可以使用

x <- c('test/test/my', 'et/tom/cat', 'set/eat/is', 'sk / handsome')
gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x)

哪个收益

[1] "my"       "cat"      "is"       "handsome"

要用一句话来表达，你可以paste：

(paste0(gsub('^(?:[^/]*/)*\\s*(.*)', '\\1', x), collapse = " "))

<小时/> 这里的模式是：

^            # start of the string
(?:[^/]*/)*  # not a slash, followed by a slash, 0+ times
\\s*         # whitespaces, eventually
(.*)         # capture the rest of the string

这被\\1取代，因此是第一个捕获组的内容。

在R中，如何在最后一个斜杠之前删除所有内容

4 个答案: