shell:通过FILE2中的内容从FILE1获取行

时间:2014-04-16 22:11:19

标签: bash awk grep sh

我有一个像这样的文件(maillog):

    Feb 22 23:53:39 info postfix[102]: connect from APVLDPDF01[...
    Feb 22 23:53:39 info postfix[101]: BA1D7805A1: client=APVLDPDF01[...
    Feb 22 23:53:39 info postfix[103]: BA1D7805A1: message-id 
    Feb 22 23:53:39 info opendkim[139]: BA1D7805A1: DKIM-Signature field added
    Feb 22 23:53:39 info postfix[763]: ED6F3805B9: to=<CORREO1@GM.COM>, relay...
    Feb 22 23:53:39 info postfix[348]: ED6F3805B9: removed
    Feb 22 23:53:39 info postfix[348]: BA1D7805A1: from=<correo@prueba.com>,...
    Feb 22 23:53:39 info postfix[102]: disconnect from APVLDPDF01...
    Feb 22 23:53:39 info postfix[842]: 59AE0805B4: to=<CO2@GM.COM>,status=sent
    Feb 22 23:53:39 info postfix[348]: 59AE0805B4: removed
    Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<CO3@GM.COM>, status=sent
    Feb 22 23:53:41 info postfix[348]: BA1D7805A1: removed

和第二个文件(mailids)如下:

    6DBDD8039F:
    3B15BC803B:
    BA1D7805A1:
    2BD19803B4:

我想获得一个包含以下内容的输出文件:

    Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<CO3@GM.COM>, status=sent

只是ID存在于第二个文件中的行,在本例中只是ID = BA1D7805A1:在文件中。但是还有另一个条件,这条线必须是&#34; ID to =&lt;&#34; 它意味着只包含&#34; to =&lt;&#34;并且可以输出文件二中的ID。

我找到了不同的解决方案,但我对性能有很大的疑问。 maillog文件大小为2GB,大约为1000万行。而mailid文件大约有32000行。

这个过程需要花费太多时间,而且我从没见过它。 我尝试过使用awk和grep命令,但我找不到最好的方法。

2 个答案:

答案 0 :(得分:2)

grep -F -f mailids maillog | grep 'to=<'

来自grep手册页:

   -F, --fixed-strings
          Interpret PATTERN as a  list  of  fixed  strings,  separated  by
          newlines,  any  of  which is to be matched.  (-F is specified by
          POSIX.)

   -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

答案 1 :(得分:1)

最好添加-w选项

   -w, --word-regexp
          Select  only  those  lines  containing  matches  that form whole
          words.  The test is that the matching substring must  either  be
          at  the  beginning  of  the  line,  or  preceded  by  a non-word
          constituent character.  Similarly, it must be either at the  end
          of  the  line  or  followed by a non-word constituent character.
          Word-constituent  characters  are  letters,  digits,   and   the
          underscore.

这是我使用的常用命令。

grep -Fwf mailids maillog |grep 'to=<'

如果ID固定在第6列,请尝试使用单行awk命令

awk 'NR==FNR{a[$1];next} /to=</&&$6 in a ' mailids maillog