R - 从字符串

时间:2017-11-03 16:22:41

标签: r regex string parsing gsub

我已经看过许多提取w / gsub的迭代,但它们主要处理从左到右或一次出现后的提取。我希望从右到左匹配,计算-的四次出现,匹配第3次和第4次出现之间的所有内容。

例如:

string                       outcome
here-are-some-words-to-try   some
a-b-c-d-e-f-g-h-i            f

以下是我尝试过的一些参考资料:

3 个答案:

答案 0 :(得分:2)

x = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")
sapply(x, function(strings){
    ind = unlist(gregexpr(pattern = "-", text = strings))
    if (length(ind) < 4){NA}
    else{substr(strings, ind[length(ind) - 3] + 1, ind[length(ind) - 2] - 1)}
})
#here-are-some-words-to-try          a-b-c-d-e-f-g-h-i 
#                    "some"                        "f" 

答案 1 :(得分:1)

您可以使用

([^-]+)(?:-[^-]+){3}$

a demo on regex101.com

<小时/> 在R中,这可能是

library(dplyr)
library(stringr)
df <- data.frame(string = c('here-are-some-words-to-try', 'a-b-c-d-e-f-g-h-i', ' no dash in here'), stringsAsFactors = FALSE)

df <- df %>%
  mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2])
df

和收益

                      string outcome
1 here-are-some-words-to-try    some
2          a-b-c-d-e-f-g-h-i       f
3            no dash in here    <NA>

答案 2 :(得分:0)

分裂你的句子怎么样?像

这样的东西
string <- "here-are-some-words-to-try"

# separate all words
val <- strsplit(string, "-")[[1]]

# reverse the order
val rev(val)

# take the 4th element
val[4]

# And using a dataframe
library(tidyverse)
tibble(string = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")) %>% 
mutate(outcome = map_chr(string, function(s) rev(strsplit(s, "-")[[1]])[4]))