(更新的示例和解决方案)数据按第一个字段顺序分组。每个第一个字段的行数是可变的。在找到匹配(不受欢迎的关键字)之后,需要删除具有与匹配相同的第一个字段的行。
输入是:
1 orange dog red
1 apple cat green
2 peach frog grey
3 apple lamb white
3 orange lamb white
3 mango cat yellow
3 apple mouse blue
如果匹配" cat"或" orange",删除具有相同第一个字段(" 1"或" 3")的行。输出将是:
2 peach frog grey
解决方案来自Costas:
awk 'NR==FNR{if($0~/cat|orange/)L[$1]=1;next} !($1 in L)' test1.txt test1.txt
答案 0 :(得分:2)
awk '
# pass input 1 time to find all occurence of "cat"
NR==FNR{
if( $0~/cat/)
L[$1]=1 # add founded 1st field into array L
next
}
# for second pass print line if value of 1st field is not in array L
!($1 in L)' input input
答案 1 :(得分:1)
如果该行符合条款,并且第一列不包含在字典f
中,则打印
awk '{
if($0!~/(cat|orange)/){
if(!($1 in f)){
print $0;
}
}else{
f[$1]=1
}
}' input
答案 2 :(得分:0)
如果非awk答案没问题,在perl中我会这样做:
use strict;
use warnings;
my @disallowed_keywords = qw ( cat orange );
my %records;
my %ids_to_reject;
my $reject_regex = join( "|", @disallowed_keywords );
$reject_regex = qr/$reject_regex/;
while ( my $line = <DATA> ) {
my ($id) = split( ' ', $line );
push( @{ $records{$id} }, $line );
if ( $line =~ $reject_regex ) { $ids_to_reject{$id}++ }
}
foreach my $id ( sort keys %records ) {
if ( not $ids_to_reject{$id} ) {
print join( "\n", @{ $records{$id} } ), "\n";
}
}
__DATA__
1 orange dog red
1 apple cat green
2 peach frog grey
3 orange lamb white
3 mango cat yellow
3 apple mouse blue