提取特殊字符之间的字符串子集

时间:2018-09-03 06:08:20

标签: r regex string

假设我有一个字符串:

“Region/Country/Industry/Product”

我只想提取第n个和第m个单斜杠之间的字符。是否存在使用现有功能的单线功能,我们可以用来做到这一点?

例如,如果要获取以下字符向量中的条目的第二和第三斜杠之间的字符串:

c(“EMEA/Germany/Automotive/Mercedes”, “APAC/SouthKorea/Technology/Samsung”, 
  “AMER/US/Wireless/Verizon”)

具有此类功能的输出为:

c(“Automotive”,”Technology”,”Wireless”).

4 个答案:

答案 0 :(得分:4)

我们可以使用sub捕获最后一个/之前的单词,在替换中,指定捕获组的后向引用(\\1

sub(".*[/](\\w+)[/]\\w+$", "\\1", str1)
#[1] "Automotive" "Technology" "Wireless"  

或者另一个变化是

sub("^([^/]+[/]){2}([^/]+).*", "\\2", str1)
#[1] "Automotive" "Technology" "Wireless"  

或在定界符/处分割字符串并提取单词

sapply(strsplit(str1, "/"), `[`, 3)
#[1] "Automotive" "Technology" "Wireless"  

数据

str1 <-  c("EMEA/Germany/Automotive/Mercedes", 
      "APAC/SouthKorea/Technology/Samsung", "AMER/US/Wireless/Verizon")

答案 1 :(得分:2)

当然是stringr解决方案,

library(stringr)
word(x, 3, sep = '/')
#[1] "Automotive" "Technology" "Wireless"

答案 2 :(得分:1)

您还可以像下面那样使用函数strsplit并定制位置

x <- c("EMEA/Germany/Automotive/Mercedes", "APAC/SouthKorea/Technology/Samsung", "AMER/US/Wireless/Verizon")
sapply(x, FUN = function(x) {
    y <- unlist(strsplit(x, split="/"))
    y[3] # This line can be customised depending the position of the word
    }
)
# "Automotive"                       "Technology"                         "Wireless" 

答案 3 :(得分:0)

您还可以删除不需要的部分:

strings <- c("EMEA/Germany/Automotive/Mercedes", "APAC/SouthKorea/Technology/Samsung","AMER/US/Wireless/Verizon")

gsub("^([^/]*/){2}|/[^/]*$","",strings)

#[1] "Automotive" "Technology" "Wireless"