如何在UNIX中比较两个csv文件并创建增量(修改/新记录)

时间:2017-03-22 16:26:46

标签: unix awk

我有两个csv文件old.csv和new.csv。我只需要来自new.csv文件的新记录或更新记录。如果old.csv中存在记录,则从new.csv中删除记录。

old.csv

"R","abc","london","1234567"
"S","def","london","1234567"
"T","kevin","boston","9876"
"U","krish","canada","1234567"

new.csv

"R","abc","london","5678"
"S","def","london","1234567"
"T","kevin","boston","9876"
"V","Bell","tokyo","2222"

new.csv中的输出

"R","abc","london","5678"     
"V","Bell","tokyo","2222"

注意:如果new.csv中的所有记录都相同,那么new.csv应为空

2 个答案:

答案 0 :(得分:4)

例如使用grep

$ grep -v -f old.csv new.csv # > the_new_new.csv 
"R","abc","london","5678"
"V","Bell","tokyo","2222"

$ grep -v -f old.csv old.csv
$                            # see, no differencies in 2 identical files

man grep

  -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

  -v, --invert-match
          Invert the sense of matching, to select non-matching lines.  (-v
          is specified by POSIX.)

然后,您可以使用awk:

$ awk 'NR==FNR{a[$0];next} !($0 in a)' old.csv new.csv
"R","abc","london","5678"
"V","Bell","tokyo","2222"

说明:

awk '
NR==FNR{            # the records in the first file are hashed to memory
    a[$0]
    next
} 
!($0 in a)          # the records which are not found in the hash are printed
' old.csv new.csv   # > the_new_new.csv 

答案 1 :(得分:0)

文件排序时:

comm -13 old.csv new.csv

如果未对它们进行排序,则允许排序:

comm -13 <(sort old.csv) <(sort new.csv)