有没有办法告诉sed
仅输出捕获的群组?例如,给出输入:
This is a sample 123 text and some 987 numbers
和模式:
/([\d]+)/
我是否可以通过反向引用格式化获得123和987输出?
答案 0 :(得分:279)
让这一点发挥作用的关键是告诉sed
排除您不想输出的内容以及指定您想要的内容。
string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'
这说:
-n
)p
)通常,在sed
中,您使用括号捕获组并使用后引用输出您捕获的组:
echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/'
将输出“bar”。如果对扩展正则表达式使用-r
(OS {X为-E
),则不需要转义括号:
echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'
最多可以有9个捕获组及其反向引用。后引用按组显示的顺序编号,但它们可以按任何顺序使用,并且可以重复:
echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'
输出“a bar a”。
如果您有GNU grep
(它也可以在BSD中运行,包括OS X):
echo "$string" | grep -Po '\d+'
或变体,例如:
echo "$string" | grep -Po '(?<=\D )(\d+)'
-P
选项启用Perl兼容正则表达式。请参阅man 3 pcrepattern
或man
3 pcresyntax
。
答案 1 :(得分:51)
Sed最多有九种记忆模式,但您需要使用转义括号来记住正则表达式的部分内容。
有关示例和更多详细信息,请参阅here
答案 2 :(得分:29)
你可以使用grep
grep -Eow "[0-9]+" file
答案 3 :(得分:8)
我认为问题中给出的模式仅作为示例,目标是匹配 任何 模式。
如果您的GNU扩展名为 sed ,允许在模式空间中插入换行符,则有一条建议是:
> set string = "This is a sample 123 text and some 987 numbers"
>
> set pattern = "[0-9][0-9]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
123
987
> set pattern = "[a-z][a-z]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
his
is
a
sample
text
and
some
numbers
使用CYGWIN,这些示例包含tcsh(是的,我 知道 错误的shell)。 (编辑:对于bash,删除set,以及=周围的空格。)
答案 4 :(得分:7)
This answer works with any count of digit groups. Example:
$ echo 'Num123that456are7899900contained0018166intext' |
> sed -En 's/[^0-9]*([0-9]{1,})[^0-9]*/\1 /gp'
123 456 7899900 0018166
Is there any way to tell sed to output only captured groups?
Yes. replace all text by the capture group:
$ echo 'Number 123 inside text' | sed 's/[^0-9]*\([0-9]\{1,\}\)[^0-9]*/\1/'
123
s/[^0-9]* # several non-digits
\([0-9]\{1,\}\) # followed by one or more digits
[^0-9]* # and followed by more non-digits.
/\1/ # gets replaced only by the digits.
Or with extended syntax (less backquotes and allow the use of +):
$ echo 'Number 123 in text' | sed -E 's/[^0-9]*([0-9]+)[^0-9]*/\1/'
123
To avoid printing the original text when there is no number, use:
$ echo 'Number xxx in text' | sed -En 's/[^0-9]*([0-9]+)[^0-9]*/\1/p'
And to match several numbers (and also print them):
$ echo 'N 123 in 456 text' | sed -En 's/[^0-9]*([0-9]+)[^0-9]*/\1 /gp'
123 456
That works for any count of digit runs:
$ str='Test Num(s) 123 456 7899900 contained as0018166df in text'
$ echo "$str" | sed -En 's/[^0-9]*([0-9]{1,})[^0-9]*/\1 /gp'
123 456 7899900 0018166
Which is very similar to the grep command:
$ str='Test Num(s) 123 456 7899900 contained as0018166df in text'
$ echo "$str" | grep -Po '\d+'
123
456
7899900
0018166
and pattern:
/([\d]+)/
Sed does not recognize the '\d' (shortcut) syntax. The ascii equivalent used above [0-9]
is not exactly equivalent. The only alternative solution is to use a character class: '[[:digit:]]`.
The selected answer use such "character classes" to build a solution:
$ str='This is a sample 123 text and some 987 numbers'
$ echo "$str" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'
That solution only works for (exactly) two runs of digits.
Of course, as the answer is being executed inside the shell, we can define a couple of variables to make such answer shorter:
$ str='This is a sample 123 text and some 987 numbers'
$ d=[[:digit:]] D=[^[:digit:]]
$ echo "$str" | sed -rn "s/$D*($d+)$D+($d+)$D*/\1 \2/p"
But, as has been already explained, using a s/…/…/gp
command is better:
$ str='This is 75577 a sam33ple 123 text and some 987 numbers'
$ d=[[:digit:]] D=[^[:digit:]]
$ echo "$str" | sed -rn "s/$D*($d+)$D*/\1 /gp"
75577 33 123 987
That will cover both repeated runs of digits and writing a short(er) command.
答案 5 :(得分:5)
尝试
sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"
我在cygwin下得到了这个:
$ (echo "asdf"; \
echo "1234"; \
echo "asdf1234adsf1234asdf"; \
echo "1m2m3m4m5m6m7m8m9m0m1m2m3m4m5m6m7m8m9") | \
sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"
1234
1234 1234
1 2 3 4 5 6 7 8 9
$
答案 6 :(得分:2)
这不是OP要求的(捕获组),但您可以使用以下方法提取数字:
S='This is a sample 123 text and some 987 numbers'
echo "$S" | sed 's/ /\n/g' | sed -r '/([0-9]+)/ !d'
给出以下内容:
123
987
答案 7 :(得分:0)
您可以使用 ripgrep,它似乎也是简单替换的 sed 替代品,就像这样
rg '(\d+)' -or '$1'
其中 ripgrep 使用 -o
或 --only matching
和 -r
或 --replace
仅输出带有 $1
的第一个捕获组(引用以避免解释为由于两次匹配,shell 变量)两次。
答案 8 :(得分:0)
我想举一个更简单的例子,说明“只用 sed 输出捕获的组”
我有 /home/me/myfile-99
并希望输出文件的序列号:99
我第一次尝试,但没有成功:
echo "/home/me/myfile-99" | sed -r 's/myfile-(.*)$/\1/'
# output: /home/me/99
为了完成这项工作,我们还需要在捕获组中捕获不需要的部分:
echo "/home/me/myfile-99" | sed -r 's/^(.*)myfile-(.*)$/\2/'
# output: 99
*) 请注意 sed 没有 \d