使用命令行工具合并2个文件

时间:2015-02-17 08:13:39

标签: csv awk sed command-line-interface

我是2个csv文件,看起来像这样:

id, name, job
1, bob, fireman
3, alice, nurse
7, peter, policeman
...

id, name, age
2, john, 26
4, craig, 32
5, mary, 45
6, lucy, 23
...

如您所见,它们都按ID排序,第一个csv中缺少的ID实际上位于第二个csv中。

是否可以通过命令行工具(例如awk或类似的东西)将这两个csv合并为一个看起来像这样的?

id, name, job, age
1, bob, fireman,
2, john, , 26
3, alice, nurse,
4, craig, , 32
...

非常感谢你的帮助?

1 个答案:

答案 0 :(得分:2)

这应该做:

awk -F, -v OFS=, 'FNR==NR && FNR>1 {a[$1]=$0;c++;next} FNR>1{$NF=" ,"$NF;a[$1]=$0;c++} END {print "id, name, job, age";for (i=1;i<=c;i++) print a[i]}' file1 file2
id, name, job, age
1, bob, fireman
2, john, , 26
3, alice, nurse
4, craig, , 32
5, mary, , 45
6, lucy, , 23
7, peter, policeman

工作原理:

awk -F, -v OFS=, '              # Set input and output Field separator to ","
FNR==NR && FNR>1 {              # For first file except first record do:
    a[$1]=$0                    # Store records inn to array "a"
    c++                         # Increment "c" for every record
    next}                       # Skip to next record
FNR>1 {                         # For second file except first record do:
    $NF=" ,"$NF                 # Replace last record with an extra ","
    a[$1]=$0                    # Store records inn to array "a"
    c++}                        # Increment "c" for every record
END {                           # When all file is read do:
    print "id, name, job, age"  # Print header
    for (i=1;i<=c;i++)          # Loop "c" times
        print a[i]}             # Print records
' file1 file2                   # Read the files
在阅读多个文件时,通常会使用

FNR==NR来区分哪个文件可以使用