Question

我有一个像这样的文件（maillog）：

    Feb 22 23:53:39 info postfix[102]: connect from APVLDPDF01[...
    Feb 22 23:53:39 info postfix[101]: BA1D7805A1: client=APVLDPDF01[...
    Feb 22 23:53:39 info postfix[103]: BA1D7805A1: message-id 
    Feb 22 23:53:39 info opendkim[139]: BA1D7805A1: DKIM-Signature field added
    Feb 22 23:53:39 info postfix[763]: ED6F3805B9: to=<CORREO1@GM.COM>, relay...
    Feb 22 23:53:39 info postfix[348]: ED6F3805B9: removed
    Feb 22 23:53:39 info postfix[348]: BA1D7805A1: from=<correo@prueba.com>,...
    Feb 22 23:53:39 info postfix[102]: disconnect from APVLDPDF01...
    Feb 22 23:53:39 info postfix[842]: 59AE0805B4: to=<CO2@GM.COM>,status=sent
    Feb 22 23:53:39 info postfix[348]: 59AE0805B4: removed
    Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<CO3@GM.COM>, status=sent
    Feb 22 23:53:41 info postfix[348]: BA1D7805A1: removed

和第二个文件（mailids）如下：

    6DBDD8039F:
    3B15BC803B:
    BA1D7805A1:
    2BD19803B4:

我想获得一个包含以下内容的输出文件：

    Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<CO3@GM.COM>, status=sent

只是ID存在于第二个文件中的行，在本例中只是ID = BA1D7805A1：在文件中。但是还有另一个条件，这条线必须是＆＃34; ID to =＆lt;＆＃34; 它意味着只包含＆＃34; to =＆lt;＆＃34;并且可以输出文件二中的ID。

我找到了不同的解决方案，但我对性能有很大的疑问。 maillog文件大小为2GB，大约为1000万行。而mailid文件大约有32000行。

这个过程需要花费太多时间，而且我从没见过它。我尝试过使用awk和grep命令，但我找不到最好的方法。

Answer 1

grep -F -f mailids maillog | grep 'to=<'

来自grep手册页：

   -F, --fixed-strings
          Interpret PATTERN as a  list  of  fixed  strings,  separated  by
          newlines,  any  of  which is to be matched.  (-F is specified by
          POSIX.)

   -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

Answer 2

最好添加-w选项

   -w, --word-regexp
          Select  only  those  lines  containing  matches  that form whole
          words.  The test is that the matching substring must  either  be
          at  the  beginning  of  the  line,  or  preceded  by  a non-word
          constituent character.  Similarly, it must be either at the  end
          of  the  line  or  followed by a non-word constituent character.
          Word-constituent  characters  are  letters,  digits,   and   the
          underscore.

这是我使用的常用命令。

grep -Fwf mailids maillog |grep 'to=<'

如果ID固定在第6列，请尝试使用单行awk命令

awk 'NR==FNR{a[$1];next} /to=</&&$6 in a ' mailids maillog

shell：通过FILE2中的内容从FILE1获取行

2 个答案: