Question

让我试着尽可能清楚地解释这个......

我有一个脚本，在某些时候这样做：

grep -vf ignore.txt input.txt

这个ignore.txt有一堆行，我想让我的grep忽略，因此-v（意思是我不想在grep的输出中看到它们）。

现在，我想要做的是我想知道ignore.txt的每一行都忽略了多少行input.txt。

例如，如果ignore.txt包含以下行：

line1
line2
line3

我想知道忽略line1会忽略多少行input.txt，忽略line2会忽略多少行等等。

关于我该怎么做的任何想法？

我希望有道理......谢谢！

Answer 1

请注意，忽略的行加上显示的行的总和可能不等于总行数...“line1和line2在这里”将被计算两次。

#!/usr/bin/perl
use warnings;
use strict;

local @ARGV = 'ignore.txt';
chomp(my @pats = <>);

foreach my $pat (@pats) {
    print "$pat: ", qx/grep -c $pat input.txt/;
}

Answer 2

根据unix.stackexchange

grep -o pattern file | wc -l

计算文件中给定模式的总数。给定此解决方案和已使用脚本的信息的解决方案是使用多个grep实例来过滤和计算您要忽略的模式。

然而，我试图建立一个更舒适的解决方案，涉及脚本语言，例如：蟒。

Answer 3

这可能对您有用：

# seq 1 15 | sed '/^1/!d' | sed -n '$='
7

说明：

删除除匹配之外的所有行。将这些匹配（忽略）行传递给另一个sed命令。删除所有这些行，但仅显示最后一行的行号。所以在这个例子1到15中，第1,10到15行被忽略 - 总共7行。

编辑：

抱歉误读了这个问题（仍然有点困惑！）：

 sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt| uniq -c,' ignore.txt | sh

这显示了matches

中每个模式的ignore.txt个数

 sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,' ignore.txt | sh

这显示了non-matches

中每个模式的ignore.txt个数

如果使用GNU sed，这些也应该有效：

sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt | uniq -c,;e' ignore.txt

或

sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,;e' ignore.txt

N.B。您对模式的成功可能会有所不同，即事先检查元字符。

经过反思，我认为这可以改进为：

sed 's,.*,/&/i\\matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c

或

sed 's,.*,/&/!i\\non-matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c

但是，不，在大文件上这实际上更慢。

Answer 4

此脚本将通过哈希查找对匹配的行进行计数，并保存要在@result中打印的行，您可以在其中处理它们。要模拟grep，只需打印它们即可。

我制作了脚本，因此可以打印出一个例子。要与文件一起使用，请取消注释脚本中的代码，并对标记为# example line的内容进行注释。

<强>代码：

use strict;
use warnings;
use v5.10;
use Data::Dumper;  # example line

# Example data. 
my @ignore = ('line1' .. 'line9'); # example line
my @input  = ('line2' .. 'line9', 'fo' .. 'fx', 'line2', 'line3'); # example line

#my $ignore = shift;  # first argument is ignore.txt
#open my $fh, '<', $ignore or die $!; 
#chomp(my @ignore = <$fh>);
#close $fh;

my @result;

my %lookup = map { $_ => 0 } @ignore;
my $rx = join '|', map quotemeta, @ignore;

#while (<>) {  # This processes the remaining arguments, input.txt etc
for (@input) { # example line
    chomp;     # Required to avoid bugs due to missing newline at eof
    if (/($rx)/) {
        $lookup{$1}++;
    } else {
        push @result, $_;
    }
}

#say for @result;       # This will emulate grep
print Dumper \%lookup;  # example line

<强>输出：

$VAR1 = {
          'line6' => 1,
          'line1' => 0,
          'line5' => 1,
          'line2' => 2,
          'line9' => 1,
          'line3' => 2,
          'line8' => 1,
          'line4' => 1,
          'line7' => 1
        };

Answer 5

ignore.txt和input.txt都已排序吗？

如果是这样，您可以使用comm命令！

$ comm -12 ignore.txt input.txt

忽略多少行？

$ comm -12 ignore.txt input.txt | wc -l

或者，如果您想进行更多处理，请将comm与awk合并。：

$ comm ignore.txt input.txt | awk '
    END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}
    {
       if ($0 !~ /^\t/) {uniqtotal+=1}
       if ($0 ~ /^\t[^\t]/) {commtotal+=1}
       if ($0 ~ /^\t\t/) {igtotal+=1}
    }'

这里我利用了comm命令放在输出中的选项卡： *如果没有标签，则该行仅在ignore.txt。 *如果只有一个标签，则仅在input.txt中 *如果有两个选项卡，则该行都在两个文件中。

顺便说一下，ignore.txt中的所有行都不会被忽略。如果该行也不在input.txt中，那么该行实际上不能说是忽略。

与Dennis Williamson的建议

comm ignore.txt input.txt | awk '
   !/^\t/ {uniqtotal++}
   /^\t[^\t]/ {commtotal++}
   /^\t\t/ {igtotal++}
     END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}'

Answer 6

这将打印忽略的匹配数以及匹配的模式：

grep -of ignore.txt input.txt | sort | uniq -c

例如：

$ perl -le 'print "Coroline" . ++$s for 1 .. 21' > input.txt
$ perl -le 'print "line2\nline14"'               > ignore.txt

$ grep -of ignore.txt input.txt | sort | uniq -c
      1 line14
      3 line2

即，匹配“line14”的行被忽略一次。匹配“line2”的行被忽略了3次。

如果你只是想计算总被忽略的行数，那就可以了：

grep -cof ignore.txt input.txt

更新：修改了上面的示例以使用字符串，以便输出更清晰。

Answer 7

while IFS= read -r pattern ; do
        printf '%s:' "$pattern"
        grep -c -v "$pattern" input.txt
done < ignore.txt

带有grep的{p> -c会计算匹配的行数，但添加-v会计算不匹配的行数。因此，只需循环遍历模式并为每个模式计数一次。

计数grep忽略的行

7 个答案:

与Dennis Williamson的建议