perl - 删除包含X的数据行

时间:2015-07-19 17:28:57

标签: perl

我想忽略包含remove.txt中列出的单词的文件的每一行。

即使remove.txt包含单词privacy,如何修改下面的命令也不起作用。

cat remove.txt | 
  perl -n0E 's/\n/|/g; say "print unless m!@($_=)\\b!i\n" ' > AUX
perl -n AUX   Filelist.txt > outfile

以下是我的数据示例:

"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"copperhill*****omes**","pg***trj@whoisprivacyprotect.com","***ox *39","Kir**and","WA","9***3","UNITED STATES"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"

您会看到第二个条目包含单词privacy。所以它不应该出现在输出中。

所以最终我想得到这个:

"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"

2 个答案:

答案 0 :(得分:0)

根据我从您上次评论中的理解,您有一个名为x的文件,其中包含以下数据:

Filelist.txt

您要删除包含文件"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE" "copperhill*****omes**","pg***trj@whoisprivacyprotect.com","***ox *39","Kir**and","WA","9***3","UNITED STATES" "ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES" 中提到的字词的每个条目。该文件如下所示:

remove.txt

任何与这些单词匹配的条目都将被跳过。

如果我遵循您的理念,您首先要从privacy standard lucy 中的单词构建正则表达式,然后将此正则表达式应用于您的文件。

因此,从上面显示的remove.txt,我们想要得到这个:

remove.txt

我们需要引用以确保m/(privacy|standard|lucy)/i 之类的字词能够正确转义:

remo\nve

如果您不需要转义无效字符,可以使用

$ perl -ne 'chomp; push @words, quotemeta; END{print "m/(".join("|",@words).")/i"}' remove
m/(privacy|standard|lucy)/i

您的第二步是在数据中应用此正则表达式

$ cat remove.txt | tr '\n' '|' | awk '{print "m/("$0")/i"}'
m/(privacy|standard|lucy)/i

然而grep可能更简单:

$ cat filelist.txt | perl -ne "print unless `cat regex`"
"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"

在所有这些内容中,请注意,文件$ cat filelist.txt | egrep -v `cat remove.txt | tr '\n' '|'` 中没有任何尾随\n,因为正则表达式将变为remove.txt而不是m/(a|b|c|)/i会匹配一切。

答案 1 :(得分:0)

我会以类似的方式进行,但都在一个Perl程序中

这是一个例子。它希望输入文件都作为命令行上的参数,因此您可以将其作为

运行
perl program.pl remove.txt Filelist.txt
use strict;
use warnings;
use 5.010;
use autodie;

my @lines = do {
  open my $fh, '<', $ARGV[0];
  <$fh>;
};
chomp @lines;
my $re = join '|', @lines;
$re = qr/(?:$re)/;

open my $fh, '<', $ARGV[1];
while ( <$fh> ) {
  print unless /$re/;
}

输出

"albu*****holmes**","ab***foo@bar.com","aef" *22","Angel**or","FR","2***3","FRANCE"
"ironhill*****shelock**","dd***trejo@foo.com","***oxtho *42","Kiss**or","CA","2***3","UNITED STATES"