Question

如果我需要提取特定模式所包含的模式（如果它存在于一行中），我可以使用sed吗？

假设我有一个包含以下行的文件：

有许多人不敢为邻居所说的[/恐惧/]而自杀。

当我们已经知道/*答案*/但希望我们没有答案时，建议就是我们要求的。

在这两种情况下，我必须扫描第一个出现模式的行，即在各自的情况下“[/”或“/*”并存储以下模式，直到退出模式，即' <{1}}]'或'/'。

简而言之，我需要*/和fear。如果可能，可以扩展多行;从某种意义上说，如果退出模式出现在不同于同一行的行中。

欢迎以建议或算法形式提供任何形式的帮助。在此先感谢您的回复

Answer 1

use strict;
use warnings;

while (<DATA>) {
    while (m#/(\*?)(.*?)\1/#g) {
        print "$2\n";
    }
}


__DATA__
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.

作为一个单行：

perl -nlwe 'while (m#/(\*?)(.*?)\1/#g) { print $2 }' input.txt

内部while循环将使用/g修饰符在所有匹配之间进行迭代。反向引用\1将确保我们只匹配相同的打开/关闭标记。

如果需要匹配延伸到多行的块，则需要粘贴输入：

use strict;
use warnings;

$/ = undef;
while (<DATA>) {
    while (m#/(\*?)(.*?)\1/#sg) {
        print "$2\n";
    }
}

__DATA__
    There are many who dare not kill themselves for [/fear/] of what the neighbors will say. /* foofer */ 
    Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
foo bar /
baz 
baaz / fooz

一衬垫：

perl -0777 -nlwe 'while (m#/(\*?)(.*?)\1/#sg) { print $2 }' input.txt

-0777开关和$/ = undef会导致文件淤变，这意味着所有文件都会被读入标量。我还添加了/s修饰符，以允许通配符.匹配换行符。

正则表达式的解释：m#/(\*?)(.*?)\1/#sg

m#              # a simple m//, but with # as delimiter instead of slash
    /(\*?)      # slash followed by optional *
        (.*?)   # shortest possible string of wildcard characters
    \1/         # backref to optional *, followed by slash
#sg             # s modifier to make . match \n, and g modifier

这里的“魔力”是，反向引用只有在其前面找到一个星号*时才需要它。

Answer 2

awk

中快速而肮脏的方式

awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' input_file

测试：

$ cat file
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.

Advice is what we ask for when we already know the /* answer */ but wish we didn't.
$ awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' file
fear

answer

Answer 3

单线匹配

如果你真的想在sed中这样做，你可以相对容易地提取你的分隔模式，只要它们在同一条线上。

# Using GNU sed. Escape a whole lot more if your sed doesn't handle
# the -r flag.
sed -rn 's![^*/]*(/\*?.*/).*!\1!p' /tmp/foo

多线匹配

如果你想用sed执行多行匹配，事情就会变得更加丑陋。但是，它当然可以完成。

# Multi-line matching of delimiters with GNU sed.
sed -rn ':loop
         /\/[^\/]/ { 
             N
             s![^*/]+(/\*?.*\*?/).*!\1!p
             T loop
         }' /tmp/foo

诀窍是寻找起始分隔符，然后在循环中追加行，直到找到结束分隔符。

只要你真的做有结尾分隔符，这个效果就会很好。否则，文件的内容将继续附加到模式空间，直到sed找到一个，或直到它到达文件的末尾。这可能会导致某些版本的sed或真正非常大的文件出现问题，因为模式空间的大小已经失控。

有关详细信息，请参阅GNU sed's Limitations and Non-limitations。

使用sed，awk或perl从行中提取特定模式

3 个答案:

测试：

单线匹配

多线匹配