vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
我想到达对应于的vecB
:
vecB <- c("Population 12 - 22",
"Population 90 over",
"population under 78",
"population 99 - 101",
"Population 12 - 54",
"Population 78 - 92")
vecB
具有以下特征:
-
)-
)underDigitDigit
等组合,仅插入空格: under DigitDigit
我正在考虑在gsub中使用群组:
gsub("^([[:alpha:]]*[[:blank:]])(\\d{2})(.*)$", "\\2", vecA)
但这并不适用于所有情况:
> t(t(gsub("^([[:alpha:]]*[[:blank:]])(\\d{2})(.*)$", "\\2", vecA)))
[,1]
[1,] "12"
[2,] "90"
[3,] "population under78"
[4,] "99"
[5,] "12"
[6,] "78"
t()
仅适用于演示目的; regex101 link 的
答案 0 :(得分:2)
这是我的建议 - 分两步完成:1)首先在数字之间添加连字符,然后2)在单词之间添加空格&#34;&#34; /&#34;&#34;和号码:
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
v <- gsub("^([[:alpha:]]+[[:blank:]]+)([[:digit:]]{2})\\s*([[:digit:]])", "\\1\\2 - \\3", vecA)
gsub("^([[:alpha:]]+[[:blank:]]+)(?|(over|under)(\\d+)|(\\d+)(over|under))", "\\1\\2 \\3", v, perl=T)
输出code demo:
[1] "Population 12 - 22" "Population 90 over" "population under 78"
[4] "population 99 - 101" "Population 12 - 54" "Population 78 - 92"
第二个正则表达式包含一个分支重置模式(?|...|...)
,以便在备用子模式中保留相同的组ID,因此需要perl=T
。