Question

我的数据框包含格式为

的用户名

"John Smith (Company Department)"

我想从用户名中提取部门，将其添加到自己的单独列中。

我已尝试过以下代码，但如果用户名是

，则会失败

"John Smith (Company Department) John Doe)"

任何人都可以提供帮助。 Reg-ex不是我的强项，下面的代码只有在用户名非标准的情况下才有效，就像我上面的例子中有多个括号一样

strcol <- "John Smith (FPO Sales) John Doe)"

start_loc <- str_locate_all(pattern ='\\(FPO ',strcol)[[1]][2]
end_loc <- str_locate_all(pattern ='\\)',strcol)[[1]][2]
substr(strcol,start_loc +1, end_loc -1)))

预期产出：

Sales

I have also tried the post here using non greedy，但收到以下错误：

错误：'['是字符串中无法识别的转义符开头“”/ [“

注意：公司将永远是相同的

Answer 1

您可以使用sub

> strcol <- "John Smith (FPO Sales) John Doe)"
> sub(".*\\(FPO[^)]*?(\\w+)\\).*", "\\1", strcol)
[1] "Sales"

.*\\(FPO会匹配(FPO
[^)]*?这会匹配任何字符，但不会匹配) 0或0次。
(\\w+)\\)捕获同一个括号内最后一个或多个单词字符。
.*会匹配所有剩余的字符。
因此，将所有匹配的字符替换为组索引1中存在的字符将为您提供所需的输出。

OR

> library(stringr)
> str_extract(strcol, perl("FPO[^)]*?\\K\\w+(?=\\))"))
[1] "Sales"

Answer 2

gsub('.*\\s(.*)\\).*\\)$','\\1',strcol)
[1] "Sales"

使用R和Regex

2 个答案: