不知道从哪里开始,但我想创建一个“坏线”黑名单,搜索现有文件中的这些行..
File:
aaaba
bbbab
cccac
dddba
eeewd
pppwp
blacklist file:
ddba
bbab
...
想要根据黑名单中的内容创建不区分大小写的搜索,以便将类似的行搜索到文本文件。然后创建一个命中文件。
line 2: bbbab
line 4: dddba
将所有字符*,&,|,:,;,#,@,!,^,(),[],/,作为标准ascii字符处理,如果它在黑名单/文件
答案 0 :(得分:1)
如果我理解您的问题,index是合适的工具:
use strict;
use warnings;
# Usage: foo.pl DATA_FILE BLACKLIST_FILE
my ($data_file, $blacklist_file) = @ARGV;
# Store the blacklist: lowercase, without newlines.
@ARGV = ($blacklist_file);
my @blacklist = map { chomp; lc } <>;
# Process the data.
@ARGV = ($data_file);
while (my $line = <>){
for my $bk (@blacklist){
# Print the line if a blacklist item is found in it.
if ( index(lc($line), $bk) > -1 ){
print 'line ', $., ': ', $line;
last;
}
}
}
答案 1 :(得分:0)
如果Perl不是必须的,您可以使用awk
$ awk 'FNR==NR{b[$1];next}{for(i in b){ if($0 ~ i){print}}}' blacklist file
bbbab
dddba
或者如果它们是完全字符串,请使用相等
awk 'FNR==NR{b[$1];next}{for(i in b){if($0 == i){ print}}}' blacklist file
答案 2 :(得分:0)
我使用Regexp::Assemble
从所有列入黑名单的单词构建正则表达式,并使用它来正确处理文件:
use strict;
use warnings;
use Regexp::Assemble;
my $file = 'test.txt';
my $blacklist = 'blacklist.txt';
my $r = Regexp::Assemble->new( flags => 'i' );
# Prepare the regex
open my $bl, '<', $blacklist or die $!;
my @blacklisted = map { quotemeta } <$bl>;
$r->add( $_ ) foreach @blacklisted;
my $regex = $r->re;
# Process the file
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
print "line $.: $_" if /$regex/;
}
答案 3 :(得分:0)
使用黑名单文件制作正则表达式并测试数据文件中的每一行。
#!/usr/bin/env perl
use strict;
# Load blacklist and make a regexp
sub make_blacklist {
open my $fd, "<", shift;
my $bl_re = join "|", map { chomp; quotemeta lc $_ } readline $fd;
return qr/$bl_re/;
}
# Process the file
my $is_blacklisted = make_blacklist("blacklist.txt");
open my $data_fd, "<", "datafile.txt";
while ( my $line = readline $data_fd ) {
print "line $.: $line" if $line =~ $is_blacklisted
}
如果黑名单很大,这可能会占用你所有的记忆。