Question

我正在使用data.table的tstrsplit从单个列中为多个表创建三个列。源列是2到4个空格的字符向量。我需要分割第一个和最后一个空格。

对于带有两个空格的源列的表，解决方案不需要正则表达式：

tbl = data.table（'source.col'= c（'启用协同功能'，'建筑师引人注目的壁''，'网格全球可交付成果'）

> tbl
                    source.col
1:  enable synergistic vortals
2: architect compelling niches
3:    mesh global deliverables

> tbl[, c('before', 'base', 'after') := tstrsplit(source.col, ' ', fixed=T)]
> tbl
                    source.col    before        base        after
1:  enable synergistic vortals    enable synergistic      vortals
2: architect compelling niches architect  compelling       niches
3:    mesh global deliverables      mesh      global deliverables

对于使用source.col具有大于2的n个空格的表，我还没有找到正则表达式。

> tbl = data.table('source.col'=c('enable synergistic vortals implement', 'architect compelling niches systems', 'mesh global deliverables enable'))
> tbl
                             source.col
1: enable synergistic vortals implement
2:  architect compelling niches systems
3:      mesh global deliverables enable

我有一个可靠的正则表达式可用于在last space，' (?!.* )'上进行拆分，但是我发现的用于在第一个空格上进行拆分的选项^[^ ]+会返回除最后一个新列。

我的问题是双重的，1）如何在第一个空格上分割，以及2）如何结合用于在第一个空格上分割的正则表达式和用于在最后一个空格上分割的正则表达式（用|也许）来获得这样的结果：

> tbl
                             source.col    before                base     after
1: enable synergistic vortals implement    enable synergistic vortals implement
2:  architect compelling niches systems architect   compelling niches   systems
3:      mesh global deliverables enable      mesh global deliverables    enable

Answer 1

在使用fread（来自sub）创建分隔符之后，我们可以使用base R

library(data.table)
tbl[,c('before', 'base', 'after') := fread(text =
      sub("^(\\w+) (.*) (\\w+)$", "\\1,\\2,\\3", 
        source.col), header = FALSE)]
tbl
#                             source.col    before                base     after
#1: enable synergistic vortals implement    enable synergistic vortals implement
#2:  architect compelling niches systems architect   compelling niches   systems
#3:      mesh global deliverables enable      mesh global deliverables    enable

R data.table / regex-在字符的第一次和最后一次出现时拆分

1 个答案: