从输入文件中删除行除了在另一个文件中列出的模式,在某些情况下可能为空?

时间:2016-04-01 09:02:42

标签: regex perl perl-module

Scenerio1:

File1 :(文件长度会有所不同,有时可能是空文件)

exclude1
exclude2  
exclude3

文件2:

statement1 that has no excludes
statement2 that has exclude3
statement3 that has no excludes
statement4 that has no excludes
statement5 that has exclude1
statement6 that has exclude2
statement7 that has no excludes

输出:

statement1 that has no excludes
statement3 that has no excludes
statement4 that has no excludes
statement7 that has no excludes

Scenerio2:

File1 :(空文件)

empty file

文件2:

statement1 that has no excludes
statement2 that has no excludes
statement3 that has no excludes
statement4 that has no excludes

输出:

statement1 that has no excludes
statement2 that has no excludes
statement3 that has no excludes
statement4 that has no excludes

脚本:

open (IN58, "<file2.txt") or die;
open (IN59, "<file1.txt") or die;
open (OUT42, ">output.txt") or die;
my @excludes = <IN59>;
chomp @excludes;
my $excludes = join ' |',@excludes;
while (<IN58>) {
next if /${excludes}/;
print OUT42 $_ ;
}
close (IN58);
close (IN59);
close (OUT42);

此脚本适用于scenario1,当排除文件(即file1)变空时,它生成空输出文件并且无法正常工作。代码中的任何更正都非常有用。

2 个答案:

答案 0 :(得分:2)

这里的诀窍是提高您对排除项的测试效率 - 您可以通过从关键字构建正则表达式来实现此目的,然后拒绝&#39;任何匹配的线条#39;一点都不

所以:

#!/usr/bin/perl
use strict;
use warnings;

my @excludes = qw ( exclude1
    exclude2
    exclude3 );

my $exclude_regex = join( "|", map {quotemeta} @excludes );
$exclude_regex = qr/$exclude_regex/;


while (<DATA>) {
    print unless /$exclude_regex/;
}


__DATA__
statement1 that has no excludes
statement2 that has exclude3
statement3 that has no excludes
statement4 that has no excludes
statement5 that has exclude1
statement6 that has exclude2
statement7 that has no excludes

现在,这里的问题当然是 - 一场空洞的比赛&#39;将有效地匹配任何东西 - 你是通配符&#39;你那场比赛。 (并排除一切)。

处理此问题最简单的方法是插入一个&#39;默认值&#39;模式,从不匹配 - 例如一个空行:

my $exclude_regex = join( "|", '^$', map {quotemeta} (  @excludes ) ) ;

这将过滤空白行包含您的一个排除字词的任何内容,生成如下所示的正则表达式:

(?^:^$|exclude1|exclude2|exclude3)

添加文件读取位:

!/usr/bin/perl
use strict;
use warnings;

open( my $data,     '<', "file2.txt" )  or die;
open( my $excludes, '<', "file1.txt" )  or die;
open( my $output,   '>', "output.txt" ) or die;

chomp( my @excludes = <$excludes> );
my $exclude_regex = join( "|", '^$', map {quotemeta} (@excludes) );
$exclude_regex = qr/$exclude_regex/;
print $exclude_regex, "\n";

select $output;
while (<$data>) {
    print unless m/$exclude_regex/;
}

因为你似乎在你的正则表达式集合中有一个空间&#39;您可能需要考虑将排除正则表达式更改为:

$exclude_regex = qr/\b$exclude_regex\b/;

在模式匹配中会包含单词边界(尽管你会略微打破空行&#39;匹配,但它不再匹配 - 但它仍然可以作为占位符使用)。

虽然我们在这里

  • 打开lexical文件句柄的3个参数很好,
  • use strict; use warnings;应被视为强制性的。
  • 考虑如果排除文件包含正则表达式元字符会发生什么。这就是quotemeta在那里的原因,将它们视为文字......但您可能会发现在排除文件中支持正则表达式很有用。

答案 1 :(得分:0)

试试这个

这里我使用了负数grep来从文件中提取匹配项,此处IEnumerator Start () { WWW www = WWW.LoadFromCacheOrDownload (BundleURL, 1); yield return www; AssetBundle bundle = www.assetBundle; AssetBundleRequest request = bundle.LoadAssetAsync (AssetName, typeof(GameObject)); yield return request; GameObject obj = request.asset as GameObject; Instantiate (obj); bundle.Unload(false); www.Dispose(); } 用于检查文件是否为空。

如果文件为空,则不满足条件,因此if condition中的值不会发生变化。如果条件满足,则@br中的值将替换为新值。

@br