我想忽略包含remove.txt
中列出的单词的文件的每一行。
即使remove.txt
包含单词privacy
,如何修改下面的命令也不起作用。
cat remove.txt |
perl -n0E 's/\n/|/g; say "print unless m!@($_=)\\b!i\n" ' > AUX
perl -n AUX Filelist.txt > outfile
以下是我的数据示例:
"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"copperhill*****omes**","pg***trj@whoisprivacyprotect.com","***ox *39","Kir**and","WA","9***3","UNITED STATES"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"
您会看到第二个条目包含单词privacy
。所以它不应该出现在输出中。
所以最终我想得到这个:
"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"
答案 0 :(得分:0)
根据我从您上次评论中的理解,您有一个名为x
的文件,其中包含以下数据:
Filelist.txt
您要删除包含文件"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"copperhill*****omes**","pg***trj@whoisprivacyprotect.com","***ox *39","Kir**and","WA","9***3","UNITED STATES"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"
中提到的字词的每个条目。该文件如下所示:
remove.txt
任何与这些单词匹配的条目都将被跳过。
如果我遵循您的理念,您首先要从privacy
standard
lucy
中的单词构建正则表达式,然后将此正则表达式应用于您的文件。
因此,从上面显示的remove.txt
,我们想要得到这个:
remove.txt
我们需要引用以确保m/(privacy|standard|lucy)/i
之类的字词能够正确转义:
remo\nve
如果您不需要转义无效字符,可以使用
$ perl -ne 'chomp; push @words, quotemeta; END{print "m/(".join("|",@words).")/i"}' remove
m/(privacy|standard|lucy)/i
您的第二步是在数据中应用此正则表达式
$ cat remove.txt | tr '\n' '|' | awk '{print "m/("$0")/i"}'
m/(privacy|standard|lucy)/i
然而grep可能更简单:
$ cat filelist.txt | perl -ne "print unless `cat regex`"
"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"
在所有这些内容中,请注意,文件$ cat filelist.txt | egrep -v `cat remove.txt | tr '\n' '|'`
中没有任何尾随\n
,因为正则表达式将变为remove.txt
而不是m/(a|b|c|)/i
会匹配一切。
答案 1 :(得分:0)
我会以类似的方式进行,但都在一个Perl程序中
这是一个例子。它希望输入文件都作为命令行上的参数,因此您可以将其作为
运行perl program.pl remove.txt Filelist.txt
use strict;
use warnings;
use 5.010;
use autodie;
my @lines = do {
open my $fh, '<', $ARGV[0];
<$fh>;
};
chomp @lines;
my $re = join '|', @lines;
$re = qr/(?:$re)/;
open my $fh, '<', $ARGV[1];
while ( <$fh> ) {
print unless /$re/;
}
"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"