我有一个字符变量,我想根据" - "分成2个变量。但是,我只想基于最后一个分隔符进行分割,因为可能有多个" - "在字符串中。例如:
Input Output1 Output2
foo - bar foo bar
hey-now-man hey-now man
say-now-girl say-now girl
fine-now fine now
我尝试过使用strsplit无济于事。
答案 0 :(得分:1)
基于stringi
和data.table
的解决方案:将字符串反转并将其拆分为固定项目,然后反转:
library(stringi)
x <- c('foo - bar', 'hey-now-man', 'say-now-girl', 'fine-now')
lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)
如果我们想用data.frame
做这个:
y <- lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)
y <- setNames(data.table::transpose(y)[2:1], c('output1', 'output2'))
df <- as.data.frame(c(list(input = x), y))
# > df
# input output1 output2
# 1 foo - bar foo bar
# 2 hey-now-man hey-now man
# 3 say-now-girl say-now girl
# 4 fine-now fine now
答案 1 :(得分:0)
您可以尝试使用gregexpr
:
a=c("foo - bar","hey-now-man","say-now-girl","fine-now")
lastdelim = tail(gregexpr("-",a)[[1]],n=1)
output1 = sapply(a,function(x) {substr(x,1,lastdelim-1)})
output2 = sapply(a,function(x) {substr(x,lastdelim+1,nchar(x))})
答案 2 :(得分:0)
您还可以使用否定前瞻:
df <- tibble(input = c("foo - bar", "hey-now-man", "say-now-girl", "fine-now"))
df %>%
separate(input, into = c("output1", "output2"), sep = "\\-(?!.*-)", remove = FALSE)
参考:
[1] https://frightanic.com/software-development/regex-match-last-occurrence/
答案 3 :(得分:0)
使用 unglue ,您可以这样做:
# install.packages("unglue")
library(unglue)
df <- data.frame(input = c("foo - bar","hey-now-man","say-now-girl","fine-now"))
unglue_unnest(df, input, "{output1}{=\\s*-\\s*}{output2=[^-]+}", remove = FALSE)
#> input output1 output2
#> 1 foo - bar foo bar
#> 2 hey-now-man hey-now man
#> 3 say-now-girl say-now girl
#> 4 fine-now fine now
由reprex package(v0.3.0)于2019-11-06创建