Question

输入：

OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.

我想只过滤匹配＆＃34;警告＆＃34;。

的主机

输出：

abc123 
abc1234
bcd111

我已经尝试了以下正则表达式匹配所有。

([\w]+)\s+:\s+:\s+Warning

是否可以使用正则表达式来避免重复？

Answer 1

当您在Perl中听到“unique”时，请考虑“hash”：

#!/usr/bin/perl
use warnings;
use strict;

my %uniq;
while (<>) {
    /:?(\S+?)[:\s]+Warning/ and $uniq{$1} = 1;
}

print "$_\n" for keys %uniq;

BTW，您输入和正则表达式不会导致您指示的输出。我更改了正则表达式，但我不确定您的输入样本是否正确。冒号的放置真的如此狂野吗？

Answer 2

OUT\s*:?([^:]*):(?=.*?\bWarning\b)(?:(?!OUT).)*(?!.*?\1[:\s]*Warning)

你可以尝试一下。参见demo.Grab捕获。

http://regex101.com/r/sK8oK9/12

Answer 3

您可以使用此perl单行：

perl -lane 'if (/\bWarning\b/) { @F[1] =~ s/(\W+)//g; print "@F[1]" }' file
abc123
abc123
abc1234
abc1234
abc1234
bcd111

Answer 4

使用此模式w / gs选项

OUT\s*:?([^:]+):\s*:\s*Warning(?!.*?\1\s*:\s*:\s*Warning)

Demo

Answer 5

这更像是对@ choroba上述回应的补充/补充，因为他用“当你听到'独特的'思考'哈希'时将其钉住了”。你应该接受@ choroba的回答:-)

在这里，我将问题的正则表达式部分简化为对grep的调用，以便专注于唯一性，稍微更改文件中的数据（因此它可以适合此处）并将其保存为{{1 }}：

dups.log

这个单行提供以下输出：

# dups.log 
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.

这与perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours) OUT abc1234: : Warning: / filesystem 100% full OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours) OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours) OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)产生的输出几乎相同。它的工作原理是因为uniq log_with_dups.log | grep Warning从它在STDIN上读取的每一行创建一个哈希键，每次看到密钥时都会向哈希添加一个键并递增其值（perl）。对于++$seen{$_}“相同的密钥”，这里表示一个重复的行。尝试打印perl或使用values %seen和-MDDP了解正在发生的事情。

要获得你的输出@ choroba的正则表达式将捕获（而不是整行）添加到哈希：

p %seen

<小时/> 但是，就像上面的整行方法一样，正则表达式只会创建一个密钥副本（来自匹配和捕获），然后用

perl -nE '/:?(\S+?)[:\s]+Warning/ && ++$seen{$1} }{ say for keys %seen' dups.log

递增它，所以你得到了++哈希中的“唯一”键la uniq。

这是一个你永远不会忘记的巧妙的perl技巧： - ）

<强>参考文献：

根据@choroba，SO question对使用哈希的%seen的perl惯用语有一些很好的解释。
perlfaq4介绍了uniq哈希技巧。
Perlmaven展示了如何使用此方法创建自己的"home made" uniq。
...

使用正则表达式删除重复项

5 个答案: