删除分散在整个字符串中的多个标点符号

时间:2018-11-27 01:33:48

标签: r regex regex-lookarounds

这些是产生的默认列名。

Please enter a sentence: Hello,World!foo:bar
['Hello', 'World', 'foo', 'bar']

我希望columnNames [1] "chain:1.theta[1]" "chain:1.theta[2]" "chain:1.theta[3]" "chain:1.theta[4]" 是:

columnNames

我想使用一个正则表达式来做到这一点。我尝试了几种不同的方法,但都没有成功。

[1] "theta1" "theta2" "theta3" "theta4"

> gsub('chain:[[:digit:]][[:punct:]]', '', columnNames)
[1] "theta[1]" "theta[2]" "theta[3]" "theta[4]"

> gsub('chain:[[:digit:]].\\[|\\]', '', columnNames)
[1] "chain:1.theta[1" "chain:4.theta[2" "chain:1.theta[3" "chain:4.theta[4"

> gsub('(?=.*chain:[[:digit:]][[:punct:]])(?=.*"\\[|\\])', '', columnNames, perl = TRUE)
[1] "chain:1.theta[1]" "chain:4.theta[2]" "chain:1.theta[3]" "chain:4.theta[4]

1 个答案:

答案 0 :(得分:3)

gsub(".*\\.(.*)\\[(\\d+)\\]", "\\1\\2", columnNames)
[1] "theta1" "theta2" "theta3" "theta4"

其中.*\\.匹配包含点在内的所有内容,在这种情况下,(.*)对应于theta,θ值对应于(\\d+)