如何比较unix中不同列的文件?

时间:2016-11-05 14:43:47

标签: linux shell awk sed grep

我想将Today.txt的文件名与Main.txt进行比较。 如果匹配,则打印Main.txt匹配文件的所有6列,新文件为matched.txt。

以及与Main.txt不匹配的文件,然后在新文件中列出TODAY.txt的文件名和时间,例如unmatched.txt

Main.txt

 date      filename          timestamp space  count   status
Nov 4    +CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

Today.txt

 filename       time
CHCK01_20161104.txt 06:03
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
AR01_20161104.txt   09:36
AR02_20161104.txt   09:36
ifs01_20161104.txt  21:16
TRIPS11_20161104.txt 09:16

所需输出: matched.txt

Nov 4    +CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

unmatched.txt

CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt  21:16

请你帮帮我吗?

提前多多感谢!

3 个答案:

答案 0 :(得分:0)

使用awkmatchedunmatched

各一个
$ awk 'NR==FNR{a[$1]; next} $3 in a{print > "matched.txt"}' Today.txt Main.txt 
$ cat matched.txt 
Nov 4    CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

$ awk 'NR==FNR{a[$3]; next} !($1 in a) && FNR>1{print > "unmatched.txt"}' Main.txt Today.txt 
$ cat unmatched.txt 
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt  21:16
  • 逻辑类似,使用第一个文件参数的必需列初始化数组aawk
  • 然后根据a中是否存在来自第二个文件的文件名,打印到所需的输出文件


使用grepawk组合:

$ grep -Ff <(awk 'NR>1{print $1}' Today.txt) Main.txt 
Nov 4    CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

$ grep -vFf <(awk 'NR>1{print $3}' Main.txt) Today.txt | tail -n+2
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt  21:16

答案 1 :(得分:0)

awk救援!

$ awk 'FNR==1{next} 
      NR==FNR{a[$1]=$2; next} 
      $3 in a{print; delete a[$3]} 
          END{for(k in a) print k,a[k] > "unmatched"}' today main > matched

$ head *matched

==> matched <==
Nov 4    CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

==> unmatched <==
ifs01_20161104.txt 21:16
CHCK09_20161104.txt 21:46
CHCK05_20161104.txt 11:10

答案 2 :(得分:0)

以下是使用管道电源的答案。

tail -n +2 /tmp/today | while read a b; do \
    if ! grep $a /tmp/main >> /tmp/matched; then \
        echo $a $b; \
    fi; \
done > /tmp/unmatched

解释

今天打印/ tmp /今天除了第一行

tail -n +2 /tmp/today

以两个变量读取文件

while read a b

grep / tmp / main中的$ a并存储在文件中

grep $a /tmp/main >> /tmp/matched

如果grep返回非零,则回显$ a和$ b

echo $a $b

输出:

root@do:~# cat /tmp/matched
Nov 4    CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
root@do:~# cat /tmp/unmatched
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
root@do:~#