run(s) of digits

Question

有没有办法告诉sed仅输出捕获的群组？例如，给出输入：

This is a sample 123 text and some 987 numbers

和模式：

/([\d]+)/

我是否可以通过反向引用格式化获得123和987输出？

Answer 1

让这一点发挥作用的关键是告诉sed排除您不想输出的内容以及指定您想要的内容。

string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

这说：

不要默认打印每一行（-n）
排除零个或多个非数字
包含一个或多个数字
排除一个或多个非数字
包含一个或多个数字
排除零个或多个非数字
打印替换（p）

通常，在sed中，您使用括号捕获组并使用后引用输出您捕获的组：

echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/'

将输出“bar”。如果对扩展正则表达式使用-r（OS {X为-E），则不需要转义括号：

echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'

最多可以有9个捕获组及其反向引用。后引用按组显示的顺序编号，但它们可以按任何顺序使用，并且可以重复：

echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'

输出“a bar a”。

如果您有GNU grep（它也可以在BSD中运行，包括OS X）：

echo "$string" | grep -Po '\d+'

或变体，例如：

echo "$string" | grep -Po '(?<=\D )(\d+)'

-P选项启用Perl兼容正则表达式。请参阅man 3 pcrepattern或man 3 pcresyntax。

Answer 2

Sed最多有九种记忆模式，但您需要使用转义括号来记住正则表达式的部分内容。

有关示例和更多详细信息，请参阅here

Answer 3

你可以使用grep

grep -Eow "[0-9]+" file

Answer 4

我认为问题中给出的模式仅作为示例，目标是匹配任何模式。

如果您的GNU扩展名为 sed ，允许在模式空间中插入换行符，则有一条建议是：

> set string = "This is a sample 123 text and some 987 numbers"
>
> set pattern = "[0-9][0-9]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
123
987
> set pattern = "[a-z][a-z]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
his
is
a
sample
text
and
some
numbers

使用CYGWIN，这些示例包含tcsh（是的，我知道错误的shell）。（编辑：对于bash，删除set，以及=周围的空格。）

Answer 5

run(s) of digits

This answer works with any count of digit groups. Example:

$ echo 'Num123that456are7899900contained0018166intext' |
> sed -En 's/[^0-9]*([0-9]{1,})[^0-9]*/\1 /gp'
123 456 7899900 0018166

Expanded answer.

Is there any way to tell sed to output only captured groups?

Yes. replace all text by the capture group:

$ echo 'Number 123 inside text' | sed 's/[^0-9]*\([0-9]\{1,\}\)[^0-9]*/\1/'
123

s/[^0-9]*                           # several non-digits
         \([0-9]\{1,\}\)            # followed by one or more digits
                        [^0-9]*     # and followed by more non-digits.
                               /\1/ # gets replaced only by the digits.

Or with extended syntax (less backquotes and allow the use of +):

$ echo 'Number 123 in text' | sed -E 's/[^0-9]*([0-9]+)[^0-9]*/\1/'
123

To avoid printing the original text when there is no number, use:

$ echo 'Number xxx in text' | sed -En 's/[^0-9]*([0-9]+)[^0-9]*/\1/p'

(-n) Do not print the input by default.
(/p) print only if a replacement was done.

And to match several numbers (and also print them):

$ echo 'N 123 in 456 text' | sed -En 's/[^0-9]*([0-9]+)[^0-9]*/\1 /gp'
123 456

That works for any count of digit runs:

$ str='Test Num(s) 123 456 7899900 contained as0018166df in text'
$ echo "$str" | sed -En 's/[^0-9]*([0-9]{1,})[^0-9]*/\1 /gp'
123 456 7899900 0018166

Which is very similar to the grep command:

$ str='Test Num(s) 123 456 7899900 contained as0018166df in text'
$ echo "$str" | grep -Po '\d+'
123
456
7899900
0018166

About \d

and pattern: /([\d]+)/

Sed does not recognize the '\d' (shortcut) syntax. The ascii equivalent used above [0-9] is not exactly equivalent. The only alternative solution is to use a character class: '[[:digit:]]`.

The selected answer use such "character classes" to build a solution:

$ str='This is a sample 123 text and some 987 numbers'
$ echo "$str" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

That solution only works for (exactly) two runs of digits.

Of course, as the answer is being executed inside the shell, we can define a couple of variables to make such answer shorter:

$ str='This is a sample 123 text and some 987 numbers'
$ d=[[:digit:]]     D=[^[:digit:]]
$ echo "$str" | sed -rn "s/$D*($d+)$D+($d+)$D*/\1 \2/p"

But, as has been already explained, using a s/…/…/gp command is better:

$ str='This is 75577 a sam33ple 123 text and some 987 numbers'
$ d=[[:digit:]]     D=[^[:digit:]]
$ echo "$str" | sed -rn "s/$D*($d+)$D*/\1 /gp"
75577 33 123 987

That will cover both repeated runs of digits and writing a short(er) command.

Answer 6

尝试

sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"

我在cygwin下得到了这个：

$ (echo "asdf"; \
   echo "1234"; \
   echo "asdf1234adsf1234asdf"; \
   echo "1m2m3m4m5m6m7m8m9m0m1m2m3m4m5m6m7m8m9") | \
  sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"

1234
1234 1234
1 2 3 4 5 6 7 8 9
$

Answer 7

这不是OP要求的（捕获组），但您可以使用以下方法提取数字：

S='This is a sample 123 text and some 987 numbers'
echo "$S" | sed 's/ /\n/g' | sed -r '/([0-9]+)/ !d'

给出以下内容：

123
987

Answer 8

您可以使用 ripgrep，它似乎也是简单替换的 sed 替代品，就像这样

rg '(\d+)' -or '$1'

其中 ripgrep 使用 -o 或 --only matching 和 -r 或 --replace 仅输出带有 $1 的第一个捕获组（引用以避免解释为由于两次匹配，shell 变量）两次。

Answer 9

我想举一个更简单的例子，说明“只用 sed 输出捕获的组”

我有 /home/me/myfile-99 并希望输出文件的序列号：99

我第一次尝试，但没有成功：

echo "/home/me/myfile-99" | sed -r 's/myfile-(.*)$/\1/'
# output: /home/me/99

为了完成这项工作，我们还需要在捕获组中捕获不需要的部分：

echo "/home/me/myfile-99" | sed -r 's/^(.*)myfile-(.*)$/\2/'
# output: 99

*) 请注意 sed 没有 \d

如何仅使用sed输出捕获的组？

9 个答案:

run(s) of digits

Expanded answer.

About \d