Question

我有一个字符变量，我想根据＆＃34; - ＆＃34;分成2个变量。但是，我只想基于最后一个分隔符进行分割，因为可能有多个＆＃34; - ＆＃34;在字符串中。例如：

Input          Output1  Output2
foo - bar      foo      bar
hey-now-man    hey-now  man
say-now-girl   say-now  girl
fine-now       fine     now

我尝试过使用strsplit无济于事。

Answer 1

基于stringi和data.table的解决方案：将字符串反转并将其拆分为固定项目，然后反转：

library(stringi)
x <- c('foo - bar', 'hey-now-man', 'say-now-girl', 'fine-now')

lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)

如果我们想用data.frame做这个：

y <- lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)

y <- setNames(data.table::transpose(y)[2:1], c('output1', 'output2'))

df <- as.data.frame(c(list(input = x), y))

# > df
# input output1 output2
# 1    foo - bar     foo     bar
# 2  hey-now-man hey-now     man
# 3 say-now-girl say-now    girl
# 4     fine-now    fine     now

Answer 2

您可以尝试使用gregexpr：

a=c("foo - bar","hey-now-man","say-now-girl","fine-now")
lastdelim = tail(gregexpr("-",a)[[1]],n=1)
output1 = sapply(a,function(x) {substr(x,1,lastdelim-1)})
output2 = sapply(a,function(x) {substr(x,lastdelim+1,nchar(x))})

Answer 3

您还可以使用否定前瞻：

df <- tibble(input = c("foo - bar", "hey-now-man", "say-now-girl", "fine-now"))

df %>% 
    separate(input, into = c("output1", "output2"), sep = "\\-(?!.*-)", remove = FALSE)

参考：

[1] https://frightanic.com/software-development/regex-match-last-occurrence/

[2] https://www.regular-expressions.info/lookaround.html

Answer 4

使用 unglue ，您可以这样做：

# install.packages("unglue")
library(unglue)
df <- data.frame(input = c("foo - bar","hey-now-man","say-now-girl","fine-now"))
unglue_unnest(df, input, "{output1}{=\\s*-\\s*}{output2=[^-]+}", remove = FALSE)
#>          input output1 output2
#> 1    foo - bar     foo     bar
#> 2  hey-now-man hey-now     man
#> 3 say-now-girl say-now    girl
#> 4     fine-now    fine     now

^{由reprex package（v0.3.0）于2019-11-06创建}

在r中仅使用最后分隔符拆分字符串

4 个答案: