Why does a space cause the remembered pattern in sed to output different things

时间:2016-07-11 20:32:23

标签: regex bash shell sed regex-greedy

I'm trying to get the value of the value entry in this xml line via terminal so I'm using sed.

abcs='<param name="abc" value="bob3" no_but_why="4"/>'

echo $abcs | sed -e 's/.*value="\(.*\)" .*/\1/'
echo $abcs | sed -e 's/.*value="\(.*\)".*/\1/'

The output is:

bob3
bob3" no_but_why="4

Why does the second way without the space cause more than just what I wanted to be printed out? Why would the \1 be affected by that

1 个答案:

答案 0 :(得分:8)

As you can see difference is use of greedy pattern .* in second regex after " without space.

Reason why it is behaving differently because there is a double quote after no_but_why= as well and .* being a greedy pattern is matching until last " before /> in second regex.

In your first regex "\(.*\)" is matching only "bob3" because there is a space after this which makes regex engine prevent .* matching till last double quote in input.

To avoid this situation you should be using negated character class instead of greedy matching.

Consider these sed command examples:

sed -e 's/.*value="\([^"]*\)" .*/\1/' <<< "$abcs"
bob3

sed -e 's/.*value="\([^"]*\)".*/\1/' <<< "$abcs"
bob3

Now you can see both command are producing same output bob3 because negated character class [^"]* will match until it gets next " not till the very last " in input as the case with .*.