awk:gsub / pattern1 /,但不是/ pattern1pattern2 /

时间:2017-09-14 07:53:38

标签: awk gsub substitution

在我的工作中,我必须解决这样一个简单的问题:将pattern1更改为newpattern,但 em> pattern3:

"pattern1 pattern1pattern2 pattern1pattern3 pattern1pattern4" → "newpattern pattern1pattern2 pattern1pattern3 newpatternpattern4"

这是我的解决方案,但我不喜欢它,我想应该有更优雅和简单的方法来做到这一点?

$ echo 'pattern1 pattern1pattern2 pattern1pattern3 pattern1pattern4' | awk '
{gsub(/pattern1pattern2/, "###", $0)
gsub(/pattern1pattern3/, "%%%", $0)
gsub(/pattern1/, "newpattern", $0)
gsub(/###/, "pattern1pattern2", $0)
gsub(/%%%/, "pattern1pattern3", $0)
print}'
newpattern pattern1pattern2 pattern1pattern3 newpatternpattern4

所以,样本输入文件:

pattern1 pattern1pattern2 aaa_pattern1pattern3 pattern1pattern4 pattern1pattern2pattern1

示例输出文件应为:

newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern

3 个答案:

答案 0 :(得分:4)

这在perl中是微不足道的,使用负向前瞻:

perl -pe 's/pattern1(?!pattern[23])/newpattern/g' file

替换pattern1pattern2未跟随的所有pattern3匹配。

如果由于某种原因你需要在awk中进行,那么这是你可以采取的一种方式:

{
    out = ""
    replacement = "newpattern"
    while (match($0, /pattern1/)) {
        if (substr($0, RSTART + RLENGTH) ~ /^pattern[23]/) {
            out = out substr($0, 1, RSTART + RLENGTH - 1)
        }
        else {
            out = out substr($0, 1, RSTART - 1) replacement
        }
        $0 = substr($0, RSTART + RLENGTH)
    }
    print out $0
}

pattern1匹配时使用输入并构建字符串out,在每次匹配后的部分不是pattern2pattern3时插入替换。一旦没有更多匹配,打印到目前为止已构建的字符串,然后输入输入中的任何内容。

答案 1 :(得分:2)

使用GNU awk为第4个arg to split():

$ cat tst.awk
{
    split($0,flds,/pattern1(pattern2|pattern3)/,seps)
    for (i=1; i in flds; i++) {
        printf "%s%s", gensub(/pattern1/,"newpattern","g",flds[i]), seps[i]
    }
    print ""
}

$ awk -f tst.awk file
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern

使用其他awks,您可以使用while(match())循环执行相同的操作:

$ cat tst.awk
{
    while ( match($0,/pattern1(pattern2|pattern3)/) ) {
        tgt = substr($0,1,RSTART-1)
        gsub(/pattern1/,"newpattern",tgt)
        printf "%s%s", tgt, substr($0,RSTART,RLENGTH)
        $0 = substr($0,RSTART+RLENGTH)
    }
    gsub(/pattern1/,"newpattern",$0)
    print
}

$ awk -f tst.awk file
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern

但显然gawk解决方案更简单,更简洁,所以,一如既往,得到傻瓜!

答案 2 :(得分:1)

awk 解决方案。好问题。基本上它正在做2个gensubs:

$ cat tst.awk
{ for (i=1; i<=NF; i++){
    s=gensub(/pattern1/, "newpattern", "g", $i);
    t=gensub(/(newpattern)(pattern(2|3))/, "pattern1\\2", "g", s);
    $i=t
  }
}1

测试:

 echo "pattern1 pattern1pattern2 aaa_pattern1pattern3 pattern1pattern4 pattern1pattern2pattern1" | awk -f tst.awk
 newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern

但是,只要输入中有newpatternpattern2这样的内容,就会失败。但这不是OP在他的输入示例中所暗示的,我想。