我收到了一个以分号分隔的CSV文件。但是,最后一列包含包含分号的自由文本。
frank; 1; 103; his name is frank
alice; 2; 09; sometimes; she says hi
kim; 3; 123; bla;bla;bla;
前三个分号用于分隔我的列,但其余的
必须替换为--
。有没有办法用正则表达式做到这一点?我一直遇到问题,因为后台必须是固定长度的。结果应如下所示:
frank; 1; 103; his name is frank
alice; 2; 09; sometimes-- she says hi
kim; 3; 123; bla--bla--bla--
我正在使用R的PCRE。
答案 0 :(得分:1)
这是使用strsplit
的非正则表达式解决方案。
data <- c("frank; 1; 103; his name is frank",
"alice; 2; 09; sometimes; she says hi",
"kim; 3; 123; bla;bla;bla;")
front <- sapply(lapply(strsplit(data, ";"), "[", 1:3),
function(x)paste(x, collapse=";"))
back <- sapply(lapply(strsplit(data, ";"), "[", -(1:3)),
function(x)paste(x, collapse="--"))
> paste(front, back, sep=";")
[1] "frank; 1; 103; his name is frank"
[2] "alice; 2; 09; sometimes-- she says hi"
[3] "kim; 3; 123; bla--bla--bla"
答案 1 :(得分:0)
我会根据你的描述走出困境,并猜测你正试图解决错误的问题。既然你用“列”来表示,用分号作为分隔符,但也可能只作为第三列中的值,那么我建议你尝试“tidyr”或{{separate
之类的东西。 1}}来自“stringi”。
这些方法就是这样的,使用以下示例数据:
stri_split_fixed
myString <- c("frank; 1; 103; his name is frank",
"alice; 2; 09; sometimes; she says hi",
"kim; 3; 123; bla;bla;bla;")
library(stringi)
stri_split_fixed(myString, ";", n = 3, simplify = TRUE)
# [,1] [,2] [,3]
# [1,] "frank" " 1" " 103; his name is frank"
# [2,] "alice" " 2" " 09; sometimes; she says hi"
# [3,] "kim" " 3" " 123; bla;bla;bla;"