如何使用awk将文件的一个列与另一个文件的另一个列进行比较?

时间:2018-03-14 08:53:02

标签: linux shell awk

我有两个文件如下:

FILE1.TXT

2018-03-14 13:23:00 CID [72883359]
2018-03-14 13:23:00 CID [275507537]
2018-03-14 13:23:00 CID [275507539]
2018-03-14 13:23:00 CID [207101094]
2018-03-14 13:23:00 CID [141289821]

和file2.txt

2018-03-14 13:23:00 CID [207101072]
2018-03-14 13:23:00 CID [275507524]
2018-03-14 13:23:00 CID [141289788]
2018-03-14 13:23:00 CID [72883352]
2018-03-14 13:23:01 CID [72883359]
2018-03-14 13:23:00 CID [275507532]

我需要比较第一个文件的第4列和第2个文件的第4列。我正在使用以下命令:

awk 'FNR==NR{a[$4]=$1" "$2" "$3; next} ($4 in a) {print a[$4],$4,$1,$2}' file1.txt file2.txt>file3.txt 

它的输出如下所示。

2018-03-14 13:23:00 CID [72883359] 2018-03-14 13:23:01

上面的命令工作正常,但问题是file1和file2很大,有大约20k行,因此上面的命令需要时间。

我希望如果找到一个匹配,那么它应该跳过剩下的列并转到下一个,意味着某种类型的break语句。请帮忙。

以下是我的剧本。

#!/bin/sh

cron=1;


for((j = $cron; j >= 1; j--))
do

d1=`date -d "$date1  $j min ago" +%Y-%m-%d`
d2=`date -d 'tomorrow' '+%Y-%m-%d'`

t1=`date -d "$date1  2 min ago" +%R`
t2=`date -d "$date1  1 min ago" +%R`
t3=`date --date="0min" +%R`
done


cat /prd/firewall/logs/lwsg_event.log | egrep "$d1|$d2" | egrep "$t1|$t2|$t3" |  grep 'SRIR' | awk -F ' ' '{print $1,$2,$4,$5}'>file1.txt


cat /prd/firewall/logs/lwsg_event.log | egrep "$d1|$d2" | egrep "$t1|$t2|$t3" | grep 'SRIC' | awk -F ' ' '{print $1,$2,$4,$5}'>file2.txt


awk 'FNR==NR{a[$4]=$1" "$2" "$3; next} ($4 in a) {print a[$4],$4,$1,$2}' file1.txt file2.txt>file3.txt

cat file3.txt | while read LINE
do
f1=`echo $LINE | cut -f 1 -d " "`
f2=`echo $LINE | cut -f 2 -d " "`

String1=$f1" "$f2

f3=`echo $LINE | cut -f 5 -d " "`
f4=`echo $LINE | cut -f 6 -d " "`

String2=$f3" "$f4


f5=`echo $LINE | cut -f 3 -d " "`
f6=`echo $LINE | cut -f 4 -d " "`

String3=$f5" "$f6

StartDate=$(date -u -d "$String1" +"%s")
FinalDate=$(date -u -d "$String2" +"%s")
echo "Diff for $String3 :" `date -u -d "0 $FinalDate sec - $StartDate sec" +"%H:%M:%S"` >final_output.txt
done


final_output.txt will be
Diff for CID [142298410] : 00:00:01
Diff for CID [273089511] : 00:00:00
Diff for CID [273089515] : 00:00:00
Diff for CID [138871787] : 00:00:00
Diff for CID [273089521] : 00:00:00
Diff for CID [208877371] : 00:00:00
Diff for CID [138871793] : 00:00:00
Diff for CID [138871803] : 00:00:00
Diff for CID [273089526] : 00:00:00
Diff for CID [273089545] : 00:00:00
Diff for CID [208877406] : 00:00:02
Diff for CID [208877409] : 00:00:01
Diff for CID [138871826] : 00:00:00
Diff for CID [74659680] : 00:00:00

3 个答案:

答案 0 :(得分:0)

您能否请关注awk并告诉我这是否对您有所帮助。

awk 'FNR==NR{a[$4]=$0;next} ($4 in a){print a[$4],$1,$2}' file1.txt  file2.txt

答案 1 :(得分:0)

您的总体脚本多次读取同一文件,并且包含大量其他效率低下的问题。

没有适当的输入来进行测试,很难对此进行验证,但这是一个重构,有望至少为进一步的探索提供一个好的方向。

#!/bin/sh

cron=1;

for((j = $cron; j >= 1; j--))
do
    # Replace obsolescent `backticks` with $(modern command substitution) syntax
    d1=$(date -d "$date1  $j min ago" +%Y-%m-%d)
    d2=$(date -d 'tomorrow' '+%Y-%m-%d')
    
    t1=$(date -d "$date1  2 min ago" +%R)
    t2=$(date -d "$date1  1 min ago" +%R)
    t3=$(date --date="0min" +%R)
done

# Avoid useless cat and useless grep, fold everything into one Awk script
# See also http://www.iki.fi/era/unix/award.html
awk -v d="$d1|$d2" -v t="$t1|$t2|$t3" '
    $0 !~ d {next} $0 !~ t { next }
    { o = "" }
    /SRIR/ { o="file1.txt" }
    /SRIC/ { o="file2.txt" }
    o { {print $1,$2,$4,$5 > o; o="" }' /prd/firewall/logs/lwsg_event.log

awk 'FNR==NR{a[$4]=$1" "$2" "$3; next} ($4 in a) {print a[$4],$4,$1,$2}' file1.txt file2.txt>file3.txt

# Avoid uppercase for private variables
# Use read -r always
# Let read split the line
while read -r f1 f2 f5 f6 f3 f4 
do
    String1=$f1" "$f2
    String2=$f3" "$f4
    String3=$f5" "$f6
    
    StartDate=$(date -u -d "$String1" +"%s")
    FinalDate=$(date -u -d "$String2" +"%s")
    echo "Diff for $String3 :" $(date -u -d "0 $FinalDate sec - $StartDate sec" +"%H:%M:%S")
done <file3.txt >final_output.txt

我想主要的瓶颈是您多次处理了日志文件,而不是在您寻求帮助的结果上运行的Awk小片段中。

仍然可以将其重构为单个Awk脚本。如果您拥有GNU Awk,那么您也应该可以在Awk中进行date计算。

答案 2 :(得分:-1)

您是否考虑过join命令?似乎没有多少人知道加入。

NAME
       join - join lines of two files on a common field

SYNOPSIS
       join [OPTION]... FILE1 FILE2