我有1.txt:
hi aa my name is bb tom
how are you cc today
我有2.txt(我不想要的那个词)
aa
bb
cc
我的预期输出是
hi my name is tom
how are you today
我到目前为止所尝试的是
sed -e "s/$(sed 's:/:\\/:g' 2.txt)/ /"
or
grep -Fvf 2.txt 1.txt
我有超过100个我不想要的单词,所以我必须用一句话来表达,谢谢
答案 0 :(得分:1)
没有标准化空间......
$ sed -f <(sed 's/.*/s_\\b&\\b__g/' remove_list) file
hi my name is tom
how are you today
aardwark
处理空格和单词边界,另一种选择
$ sed -f <(sed 's/.*/s_ &\\b__g;s_\\b& __g;s_\\b&\\b__g/' remove) file
但是,此时切换到awk
答案 1 :(得分:0)
gawk 解决方案:
awk 'NR==FNR{ a[$0]; next }{ for(i in a) gsub("\\<"i"\\> *","",$0) }1' 2.txt 1.txt
输出:
hi my name is tom
how are you today
a[$0]
- 累积应从每个句子中删除的字词
gsub("\\<"i"\\> *","",$0)
- 替换每个&#34;不需要的&#34;单词(整个单词),空字符串