在一定数量的重复后替换符号的正则表达式

时间:2016-03-14 15:24:41

标签: regex r

我收到了一个以分号分隔的CSV文件。但是,最后一列包含包含分号的自由文本。

frank; 1; 103; his name is frank
alice; 2; 09; sometimes; she says hi
kim; 3; 123; bla;bla;bla;

前三个分号用于分隔我的列,但其余的 必须替换为--。有没有办法用正则表达式做到这一点?我一直遇到问题,因为后台必须是固定长度的。结果应如下所示:

frank; 1; 103; his name is frank
alice; 2; 09; sometimes-- she says hi
kim; 3; 123; bla--bla--bla--

我正在使用R的PCRE。

2 个答案:

答案 0 :(得分:1)

这是使用strsplit的非正则表达式解决方案。

data <- c("frank; 1; 103; his name is frank",
          "alice; 2; 09; sometimes; she says hi",
          "kim; 3; 123; bla;bla;bla;")
front <- sapply(lapply(strsplit(data, ";"), "[", 1:3), 
                function(x)paste(x, collapse=";"))
back <- sapply(lapply(strsplit(data, ";"), "[", -(1:3)), 
               function(x)paste(x, collapse="--"))
> paste(front, back, sep=";")
[1] "frank; 1; 103; his name is frank"     
[2] "alice; 2; 09; sometimes-- she says hi"
[3] "kim; 3; 123; bla--bla--bla"  

答案 1 :(得分:0)

我会根据你的描述走出困境,并猜测你正试图解决错误的问题。既然你用“列”来表示,用分号作为分隔符,但也可能只作为第三列中的值,那么我建议你尝试“tidyr”或{{separate之类的东西。 1}}来自“stringi”。

这些方法就是这样的,使用以下示例数据:

stri_split_fixed

“Stringi”

myString <- c("frank; 1; 103; his name is frank", 
              "alice; 2; 09; sometimes; she says hi",
              "kim; 3; 123; bla;bla;bla;")

“dplyr”+“tidyr”

library(stringi)
stri_split_fixed(myString, ";", n = 3, simplify = TRUE)
#      [,1]    [,2] [,3]                         
# [1,] "frank" " 1" " 103; his name is frank"    
# [2,] "alice" " 2" " 09; sometimes; she says hi"
# [3,] "kim"   " 3" " 123; bla;bla;bla;"