将文本文件与其他文件进行比较

时间:2013-06-01 07:26:19

标签: awk

我有一个名为file.txt的文件,如下所示

12   2
15   7
134  8
154  12
155  16
167  6
175  45
45   65
812  54

我有另外五个名为A.txt,B.txt,C.txt,D.txt,E.txt的文件。这些文件的内容如下所示。

 A.txt 
  45
  134

 B.txt
  15
  812
  155

 C.txt
  12
  154 

 D.txt
  175    

 E.txt
  167

我需要检查哪个文件包含file.txt的第一列的值,并将该文件的名称打印为第三列。

输出: -

12   2   C
15   7   B
134  8   A
154  12  C
155  16  B
167  6   E
175  45  D
45   65  A
812  54  B

3 个答案:

答案 0 :(得分:3)

这应该有效:

<强>一衬垫:

awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) }1' {A..E}.txt file.txt

使用评论进行格式化:

awk '

#Check if the filename is not of the main file

FILENAME != "file.txt" { 

#Create a hash. Store column 1 values of look up files as key and assign filename as values

    a[$1]=FILENAME 

#Skip the rest of the action

    next  
} 

#Check the first column of main file is a key in the hash

$1 in a { 

#If the key exists, assign the value of the key (which is filename) as Column 3 of main file

    $3=a[$1]

#Using sub function, strip the extension of the file name as desired in your output

    sub(/\..*/,"",$3) 

#1 is a non-zero value forcing awk to print. {A..E} is brace expansion of your files. 

}1' {A..E}.txt file.txt

注意:主文件最后需要传递。

测试:

[jaypal:~/Temp] awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) ; printf "%-5s%-5s%-5s\n",$1,$2,$3}' {A..E}.txt file.txt
12   2    C
15   7    B
134  8    A
154  12   C
155  16   B
167  6    E
175  45   D
45   65   A
812  54   B

答案 1 :(得分:1)

#! /usr/bin/awk -f

FILENAME == "file.txt" {
    a[FNR] = $0;
    c=FNR;
}

FILENAME != "file.txt" {
    split(FILENAME, name, ".");
    k[$1] = name[1];
}

END {
    for (line = 1; line <= c; line++) {
        split(a[line], seg, FS);
        print a[line], k[seg[1]];
    }
}

# $ awk -f script.awk *.txt

答案 2 :(得分:0)

此解决方案不保留订单

join <(sort file.txt) \
     <(awk '
            FNR==1 {filename = substr(FILENAME, 1, length(FILENAME)-4)} 
            {print $1, filename}
       ' [ABCDE].txt |
       sort) |
column -t
12   2   C
134  8   A
15   7   B
154  12  C
155  16  B
167  6   E
175  45  D
45   65  A
812  54  B