比较2个文件,找到一个字符串,并从其中一个报告几行

时间:2014-01-10 13:17:02

标签: perl awk

我想比较并报告2个文件中的类似字符串: 每次,读取1行file1,在file2中搜索字符串,如果能在file2中找到它,打印前一行,字符串和字符串后面的2行

infile2:

john
jack
jeff

infile1:

22894
john
street3
city
56438
danny
street2
city
22894
john
street3
city
33456
jeff
street2
city
22894
john
street3
city

输出

22894
john
street3
city
22894
john
street3
city
33456
jeff
street2
city
22894
john
street3
city

我可以为此编写一个perl脚本 perl script.pl infile2 infile1

#!/usr/bin/perl
use warnings;
use strict;


my ($infile1) = $ARGV[0];
my ($infile2) = $ARGV[1];
open(my $fh1, "<$infile1");
open(my $fh2, "<$infile2");

while(( my @lines = map ~~<$fh1>, 1 .. 4 )[0] and (my $names = <$fh2>)) {
    #print $lines[1];
    #print "\n";        


        if ($lines[1] eq $names){
                print ("$lines[0]$lines[1]$lines[2]$lines[3]");}
                }
print ("\n");

但我只得到

22894
john
street3
city

4 个答案:

答案 0 :(得分:2)

我认为您至少需要这样做,以避免地址记录文件的其他部分出现错误匹配:

$ awk '
NR==FNR {names[$0]; next }
{
    lineNr = ((FNR+3)%4)+1
    rec = rec $0 ORS
}
lineNr == 2 { name = $0 }
lineNr == 4 {
    if (name in names) {
        printf "%s", rec
    }
    rec=""
}
' file2 file1
22894
john
street3
city
22894
john
street3
city
33456
jeff
street2
city
22894
john
street3
city

答案 1 :(得分:2)

这是另一种选择:

use strict;
use warnings;

my ( $file2, $last ) = pop;
my %hash = map { chomp; $_ => 1 } <>;

push @ARGV, $file2;
while (<>) {
    chomp;
    print "$last\n$_\n" . <> . <> if $hash{$_};
    $last = $_;
}

用法:perl script.pl inFile2 inFile1 [>outFile]

最后一个可选参数将输出定向到文件。

inFiles以您的使用命名;首先是较小的专有名称文件。

第一个pop关闭并保存第二个文件名以供日后使用,然后创建正确名称的哈希值。然后它遍历第二个文件,如果当前行在正确的名称文件中,则打印最后两行。变量$last未立即初始化,因为在有前一行之前不会发生匹配。

希望这有帮助!

答案 2 :(得分:1)

这是一个awk解决方案:

awk -f a.awk file2 file1

其中a.awk是:

NR==FNR {
    a[$1]++
    next
}
{
    b[FNR]=$0
}

END {
    for (i=1; i<=FNR; i++)
        if (b[i] in a) 
            for (j=i-1; j<=i+2; j++)
                print b[j]
}

输出:

22894
john
street3
city
22894
john
street3
city
33456
jeff
street2
city
22894
john
street3
city

答案 3 :(得分:1)

您的输入文件

 root@Aix:/tmp# cat file2
 john
 jack
 jeff

 root@Aix:/tmp# cat file1
 22894
 john
 street3
 city
 56438
 danny
 street2
 city
 22894
 john
 street3
 city
 33456
 jeff
 street2
 city
 22894
 john
 street3
 city

Awk Code

 root@Aix:/tmp# cat test.sh

 awk '     FNR==NR{
                   A[$1]             # Array 'A' holds key to search 
                   next
                  }
 # if previous and column1 of file1 in found in Array 'A' of file2 
 (p && ($1 in A)){

                   # counter reset after search found
                   i=0 

                   # We found that key we found, so print previous line and current line            
                   print p RS $0     

                   # Number of lines to print after key found
                   while(++i<=number_of_lines)
                                {     
                                 getline
                                 print
                                }
                }

                {
                  # Previous line is assigned to variable p
                  p=$0
                }
    ' number_of_lines="2" file2 file1

所得

 # sh test.sh
 22894
 john
 street3
 city
 22894
 john
 street3
 city
 33456
 jeff
 street2
 city
 22894
 john
 street3
 city

如果您想在Solaris/SunOS system上尝试此操作,请将awk更改为/usr/xpg4/bin/awk/usr/xpg6/bin/awknawk

根据您的需要更改number_of_lines="2"