当区域重叠时,Sed不会替换文件中的所有实例

时间:2012-01-06 01:20:54

标签: regex linux bash sed

我需要用其他单词替换几个单词。

例如:file中带有“FRUIT”的“apple”,仅限于以下4种情况:

  • _apple_,之前和之后都有空格。
  • [apple_,前面有一个方形的开口括号,后面有一个空格。
  • _apple],前面有一个空格,后面有一个方括号。
  • [apple],前后有方括号。

我不希望在任何其他情况下发生替换。

我尝试使用以下代码:

a="apple"
b="fruit"
sed -i "s/ $a / $b /g" ./file
sed -i "s/\[$a /\[$b /g" ./file
sed -i "s/ $a\]/ $b\]/g" ./file
sed -i "s/\[$a\]/\[$b\]/g" ./file

我认为最后选项“g”意味着它会替换所有实例,但我发现这不是一个彻底的解决方案。对于例如如果file包含此内容:

apple spider apple apple spider tree apple tree

第三次出现的“苹果”没有被替换。同样在这里,这个词的几个外观没有改变:

apple  spider apple apple apple apple apple spider tree apple tree

我怀疑这是因为共享的“空间”。

如何查找并替换$a的所有$b实例,无论是否重叠?

5 个答案:

答案 0 :(得分:3)

您可以使用反向引用来完成此操作。这应该是完全POSIX兼容的

sed -i 's/^badger\([] ]\)/SNAKE\1/g; \
        s/\([[ ]\)badger$/\1SNAKE/g; \
        s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g; \
        s/ badger]/ SNAKE]/g' ./infile

实施例

$ sed 's/^badger\([] ]\)/SNAKE\1/g;s/\([[ ]\)badger$/\1SNAKE/g;s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g;s/ badger]/ SNAKE]/g' <<<"badger [badger badger] [badger] badger foobadger badgering mushroom badger"
SNAKE [SNAKE SNAKE] [SNAKE] SNAKE foobadger badgering mushroom SNAKE

答案 1 :(得分:3)

快速而肮脏的解决方案是进行两次更换。

$ echo apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g; s/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT FRUIT apple[FRUIT FRUIT]

这是安全的,因为在第一个命令之后,生成的文本将不包含原始文本中尚未出现的任何(\[| )apple( |\])

缺点是两次更换需要大约两倍的时间才能运行。

如果你在 sed 的两次执行中打破它,你可以看到更清楚的步骤:

$ echo apple apple apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT apple FRUIT apple apple[FRUIT apple]

$ echo apple FRUIT apple FRUIT apple apple[FRUIT apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT FRUIT FRUIT FRUIT apple[FRUIT FRUIT]

答案 2 :(得分:2)

sed -i "s/\bapple\b/FRUIT/g" file

\b匹配字边界。可能不完全可移植,至少在Mac OS X上不起作用。

更有趣的测试:

$ cat file; sed "s/\bapple\b/FRUIT/g" file
apple apple apple spider tree apple tree applejuice pineapple apple.com etc
FRUIT FRUIT FRUIT spider tree FRUIT tree applejuice pineapple FRUIT.com etc

答案 3 :(得分:1)

考虑使用前瞻和后视:

s/(?<=[\s\[])apple(?=[\s\]])/FRUIT/g

演示:http://regexr.com?2vl8p


好的,我现在在我的计算机上测试了regex,并注意到在标准sed中使用ssed并使用--regexp-perl时,向后看并看后面不起作用而是选项:

uname -msrv
Darwin 11.2.0 Darwin Kernel Version 11.2.0: Tue Aug  9 20:54:00 PDT 2011; root:xnu-1699.24.8~1/RELEASE_X86_64 x86_64
ssed --ver
super-sed version 3.62
based on GNU sed version 4.1

Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
ssed -R 's/(?<=[\s\[])apple(?=[\s\]])/FRUIT/g'
apple spider apple apple spider tree apple tree
apple spider FRUIT FRUIT spider tree FRUIT tree

答案 4 :(得分:1)

使用sed的一种方式:

sed "s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g; s/\( \|\[\)$a\( \|\]\)/\1$b\2/g; s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g" file

有三个替换命令。说明:

s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g      # Duplicate each space character surrounded with non-space 
                                          # characters.
s/\( \|\[\)$a\( \|\]\)/\1$b\2/g           # Substitute content of variable '$a' when just before there is a 
                                          # blank or '[' and just after another space or ']'. Any combination
                                          # of those. And replace with content of variable '$b' and same
                                          # groups of the pattern (\1 and \2).
s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g    # Remove a space when found two consecutive surrounded with 
                                          # non-space characters.

我的测试:

文件的内容

apple spider apple apple spider tree apple tree
apple spider [apple apple spider tree apple] tree
apple spider apple apple spider tree appletree
apple spider apple apple spider tree [apple] tree
apple  spider apple apple apple apple apple spider tree apple tree

设置变量:

a="apple"
b="fruit"

运行sed命令:

sed "s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g; s/\( \|\[\)$a\( \|\]\)/\1$b\2/g; s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g" file

结果:

apple spider fruit fruit spider tree fruit tree
apple spider [fruit fruit spider tree fruit] tree
apple spider fruit fruit spider tree appletree
apple spider fruit fruit spider tree [fruit] tree
apple spider fruit fruit fruit fruit fruit spider tree fruit tree

如果您的真实文件具有不同的空格分布或具有奇怪的格式,则无效。在这种情况下,sed是一种有限的工具,它会更好perl或类似于前瞻和后视。