Question

This question有一个很好的答案，说您可以使用awk '!seen[$0]++' file.txt从文件中删除不连续的重复行。仅当匹配模式时，如何才能从文件中删除非连续重复行？例如仅当它们包含字符串“ #####”

时

示例输入

deleteme.txt ##########
1219:                            'PCM BE PTP'
deleteme.txt ##########
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1223:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1225:                          , 'PCM FE/MID PTP'

所需的输出

deleteme.txt ##########
1219:                            'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1223:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1225:                          , 'PCM FE/MID PTP'

Answer 1

您可以使用

awk '!/#####/ || !seen[$0]++'

或者作为Ed Morton suggests的同义词

awk '!(/#####/ && seen[$0]++)'

这里，!seen[$0]++与往常一样，它将删除所有重复的行。 !/#####/部分匹配包含#####模式的行，并取消匹配。结合使用||的两个模式将删除其中所有具有#####模式的重复行。

查看online awk demo：

s="deleteme.txt ##########
1219:                            'PCM BE PTP'
deleteme.txt ##########
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1223  #####:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1225:                          , 'PCM FE/MID PTP'"
awk '!/#####/ || !seen[$0]++' <<< "$s"

输出：

deleteme.txt ##########
1219:                            'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1223  #####:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1225:                          , 'PCM FE/MID PTP'

Answer 2

使用文件配置模式尝试此Perl命令行正则表达式解决方案。

perl -0777 -ne ' $z=$y=$_; 
                 while( $y ne $x) 
                 { $z=~s/(^[^\n]+?\s+##########.*?$)(.+?)\K(\1\n)//gmse ; $x=$y ;$y=$z } ; 
                 print "$z" '

具有给定的输入

$ cat toucan.txt
deleteme.txt ##########
1219:                            'PCM BE PTP'
deleteme.txt ##########
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1223:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1225:                          , 'PCM FE/MID PTP'

$ perl -0777 -ne ' $z=$y=$_; while( $y ne $x) { $z=~s/(^[^\n]+?\s+##########.*?$)(.+?)\K(\1\n)//gmse ; $x=$y ;$y=$z } ; print "$z" ' toucan.txt
deleteme.txt ##########
1219:                            'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1223:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1225:                          , 'PCM FE/MID PTP'

$

Answer 3

每当我想到匹配图案和选择性打印时，我都会想到实用的提取和报告语言：Perl！这是一个Perl内衬，可以满足您的要求。您应该能够将其复制粘贴到外壳中并使其工作：

perl -wnle 'BEGIN { $rows_with_five_hashes = {}; } $thisrow = $_; if ($thisrow =~ /[#]{5}/) { if (!exists $rows_with_five_hashes->{$thisrow}) { print; } $rows_with_five_hashes->{$thisrow}++; } else { print; }' input.txt

这里是相同的Perl，带有换行符和注释（为了清楚起见，请注意：这不能原样执行）：

BEGIN {
  # create a counter for rows that match the pattern
  $rows_with_five_hashes = {}; 
} 
# capture the row from the input file
$thisrow = $_;
if ($thisrow =~ /[#]{5}/) { 
  if (!exists $rows_with_five_hashes->{$thisrow}) { 
    # this row matches the pattern and we haven't seen it before
    print; 
  } 
  # Increment the counter for rows that match the pattern.
  # Do this AFTER we print, or else our "exists" print logic fails.
  $rows_with_five_hashes->{$thisrow}++;
} 
else { 
  # print all rows that don't match the pattern
  print;
}

Ruby具有类似的“单行”功能，可直接在命令行上运行代码（其中很多是从Perl借用的）。

有关wnle命令行开关的更多信息，请查看Perl docs about that。如果您有很多文件要修改并使用单个Perl命令保留原始文件的备份副本，请查看这些文档中的-i开关。

如果您发现自己一直在运行该脚本，并且希望保留一个方便的可执行脚本，则可以很轻松地对其进行修改，使其在几乎所有具有Perl解释器的系统上运行。

Answer 4

这可能对您有用（GNU sed）：

sed '/#$/{G;/^\(\S*\s\).*\1/!P;h;d}' file

除感兴趣的行以外的所有行均正常打印。

将先前感兴趣的行添加到当前行，并使用模式匹配，如果以前没有遇到过这样的行，请打印它。然后将模式空间重新存储在保留空间中，为下一场比赛做好准备，并删除模式空间。

仅当重复的行与模式匹配时才删除它们

4 个答案: