在我的工作中,我必须解决这样一个简单的问题:将pattern1更改为newpattern,但仅 em> pattern3:
"pattern1 pattern1pattern2 pattern1pattern3 pattern1pattern4" → "newpattern pattern1pattern2 pattern1pattern3 newpatternpattern4"
这是我的解决方案,但我不喜欢它,我想应该有更优雅和简单的方法来做到这一点?
$ echo 'pattern1 pattern1pattern2 pattern1pattern3 pattern1pattern4' | awk '
{gsub(/pattern1pattern2/, "###", $0)
gsub(/pattern1pattern3/, "%%%", $0)
gsub(/pattern1/, "newpattern", $0)
gsub(/###/, "pattern1pattern2", $0)
gsub(/%%%/, "pattern1pattern3", $0)
print}'
newpattern pattern1pattern2 pattern1pattern3 newpatternpattern4
所以,样本输入文件:
pattern1 pattern1pattern2 aaa_pattern1pattern3 pattern1pattern4 pattern1pattern2pattern1
示例输出文件应为:
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern
答案 0 :(得分:4)
这在perl中是微不足道的,使用负向前瞻:
perl -pe 's/pattern1(?!pattern[23])/newpattern/g' file
替换pattern1
或pattern2
未跟随的所有pattern3
匹配。
如果由于某种原因你需要在awk中进行,那么这是你可以采取的一种方式:
{
out = ""
replacement = "newpattern"
while (match($0, /pattern1/)) {
if (substr($0, RSTART + RLENGTH) ~ /^pattern[23]/) {
out = out substr($0, 1, RSTART + RLENGTH - 1)
}
else {
out = out substr($0, 1, RSTART - 1) replacement
}
$0 = substr($0, RSTART + RLENGTH)
}
print out $0
}
在pattern1
匹配时使用输入并构建字符串out
,在每次匹配后的部分不是pattern2
或pattern3
时插入替换。一旦没有更多匹配,打印到目前为止已构建的字符串,然后输入输入中的任何内容。
答案 1 :(得分:2)
使用GNU awk为第4个arg to split():
$ cat tst.awk
{
split($0,flds,/pattern1(pattern2|pattern3)/,seps)
for (i=1; i in flds; i++) {
printf "%s%s", gensub(/pattern1/,"newpattern","g",flds[i]), seps[i]
}
print ""
}
$ awk -f tst.awk file
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern
使用其他awks,您可以使用while(match())循环执行相同的操作:
$ cat tst.awk
{
while ( match($0,/pattern1(pattern2|pattern3)/) ) {
tgt = substr($0,1,RSTART-1)
gsub(/pattern1/,"newpattern",tgt)
printf "%s%s", tgt, substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+RLENGTH)
}
gsub(/pattern1/,"newpattern",$0)
print
}
$ awk -f tst.awk file
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern
但显然gawk解决方案更简单,更简洁,所以,一如既往,得到傻瓜!
答案 2 :(得分:1)
awk 解决方案。好问题。基本上它正在做2个gensubs:
$ cat tst.awk
{ for (i=1; i<=NF; i++){
s=gensub(/pattern1/, "newpattern", "g", $i);
t=gensub(/(newpattern)(pattern(2|3))/, "pattern1\\2", "g", s);
$i=t
}
}1
测试:
echo "pattern1 pattern1pattern2 aaa_pattern1pattern3 pattern1pattern4 pattern1pattern2pattern1" | awk -f tst.awk
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern
但是,只要输入中有newpatternpattern2
这样的内容,就会失败。但这不是OP在他的输入示例中所暗示的,我想。