Question

我有两个csv文件old.csv和new.csv。我只需要来自new.csv文件的新记录或更新记录。如果old.csv中存在记录，则从new.csv中删除记录。

old.csv

"R","abc","london","1234567"
"S","def","london","1234567"
"T","kevin","boston","9876"
"U","krish","canada","1234567"

new.csv

"R","abc","london","5678"
"S","def","london","1234567"
"T","kevin","boston","9876"
"V","Bell","tokyo","2222"

new.csv中的输出

"R","abc","london","5678"     
"V","Bell","tokyo","2222"

注意：如果new.csv中的所有记录都相同，那么new.csv应为空

Answer 1

例如使用grep：

$ grep -v -f old.csv new.csv # > the_new_new.csv 
"R","abc","london","5678"
"V","Bell","tokyo","2222"

和

$ grep -v -f old.csv old.csv
$                            # see, no differencies in 2 identical files

man grep：

  -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

  -v, --invert-match
          Invert the sense of matching, to select non-matching lines.  (-v
          is specified by POSIX.)

然后，您可以使用awk：

$ awk 'NR==FNR{a[$0];next} !($0 in a)' old.csv new.csv
"R","abc","london","5678"
"V","Bell","tokyo","2222"

说明：

awk '
NR==FNR{            # the records in the first file are hashed to memory
    a[$0]
    next
} 
!($0 in a)          # the records which are not found in the hash are printed
' old.csv new.csv   # > the_new_new.csv

Answer 2

文件排序时：

comm -13 old.csv new.csv

如果未对它们进行排序，则允许排序：

comm -13 <(sort old.csv) <(sort new.csv)

如何在UNIX中比较两个csv文件并创建增量（修改/新记录）

2 个答案: