根据特定元素的位置在字符串向量中插入连字符或短划线

时间:2016-02-03 13:47:49

标签: regex r string vector gsub

鉴于是 vecA

vecA <- c("Population 1222",
          "Population 90over",
          "population under78",
          "population 99101",
          "Population 1254", 
          "Population 78 92")

问题

我想到达对应于的vecB

vecB <- c("Population 12 - 22",
          "Population 90 over",
          "population under 78",
          "population 99 - 101",
          "Population 12 - 54", 
          "Population 78 - 92")

主要特征

vecB具有以下特征:

  • 插入前两位数字空格和短划线和空格后(-
  • 如果空间仅存在,则插入短划线(-
  • 对于 underDigitDigit 等组合,仅插入空格: under DigitDigit

的尝试

我正在考虑在中使用群组:

gsub("^([[:alpha:]]*[[:blank:]])(\\d{2})(.*)$", "\\2", vecA)

但这并不适用于所有情况:

> t(t(gsub("^([[:alpha:]]*[[:blank:]])(\\d{2})(.*)$", "\\2", vecA)))
     [,1]                
[1,] "12"                
[2,] "90"                
[3,] "population under78"
[4,] "99"                
[5,] "12"                
[6,] "78" 

t()仅适用于演示目的; regex101 link

1 个答案:

答案 0 :(得分:2)

这是我的建议 - 分两步完成:1)首先在数字之间添加连字符,然后2)在单词之间添加空格&#34;&#34; /&#34;&#34;和号码:

vecA <- c("Population 1222",
           "Population 90over",
           "population under78",
           "population 99101",
           "Population 1254", 
           "Population 78 92")
v <- gsub("^([[:alpha:]]+[[:blank:]]+)([[:digit:]]{2})\\s*([[:digit:]])", "\\1\\2 - \\3", vecA)
gsub("^([[:alpha:]]+[[:blank:]]+)(?|(over|under)(\\d+)|(\\d+)(over|under))", "\\1\\2 \\3", v, perl=T)

输出code demo

[1] "Population 12 - 22"  "Population 90 over"  "population under 78"
[4] "population 99 - 101" "Population 12 - 54"  "Population 78 - 92"

第二个正则表达式包含一个分支重置模式(?|...|...),以便在备用子模式中保留相同的组ID,因此需要perl=T