我有一个大文本文件,我想从中删除另一个文本文件中的某些行。似乎Unix shell中的sed
命令是一种很好的方法。但是,我无法弄清楚要使用哪些标志。
database.txt:
this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
lines_to_remove.txt
this is line 1
this is line 3
what_i_want.txt
this is line 2
this is line 4
this is line 5
答案 0 :(得分:6)
grep
比sed
更适合:
grep -Fxv -f lines_to_remove.txt database.txt > what_i_really_really_want.txt
答案 1 :(得分:1)
在awk
:
$ awk 'NR==FNR{a[$0];next}!($0 in a)' remove.txt database.txt
this is line 2
this is line 4
this is line 5
$ awk 'NR==FNR{a[$0];next}!($0 in a)' remove.txt database.txt > output.txt
答案 2 :(得分:1)
我会使用comm
:
comm -1 <(sort database.txt) <(sort lines_to_remove.txt) > what_i_want.txt
该命令更适合您的需求。
注意:<(commmand)
语法是一种基础,因此在SO上受到很多诽谤。这是以下方面的简写:
sort database.txt > sorted_database.txt
sort lines_to_remove.txt > sorted_lines_to_remove.txt
comm -1 sorted_database.txt sorted_lines_to_remove.txt > what_i_want.txt