Question

我想从文件中以“SET CURRENT”开头的pargraph中删除重复的行，它们共享相同的第一行并且具有相同的句子，并且我不删除属于不同段落的重复行，例如：

如果我有以下文件：

SET CURRENT = 'aaa' ;
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file2 FOR 1000.file2 ;
CREATE SYN file3 FOR 1001.file3 ;
CREATE SYN file3 FOR 1001.file3 ;

SET CURRENT = 'aaa' ;
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file2 FOR 1000.file2 ;
CREATE SYN file7 FOR 1000.file7 ;

SET CURRENT = 'bbb' ;
CREATE SYN file5 FOR 1002.file5 ;
CREATE SYN file6 FOR 1003.file6 ;

SET CURRENT = 'bbb' ;  
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file8 FOR 1002.file8 ;
CREATE SYN file6 FOR 1003.file6 ;

结果就像

SET CURRENT = 'aaa' ;
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file2 FOR 1000.file2 ;
CREATE SYN file3 FOR 1001.file3 ;

SET CURRENT = 'aaa' ;
CREATE SYN file7 FOR 1000.file7 ;

SET CURRENT = 'bbb' ;
CREATE SYN file5 FOR 1002.file5 ;
CREATE SYN file6 FOR 1003.file6 ;

SET CURRENT = 'bbb' ;
CREATE SYN file1 FOR 1000.file1 ;
CREATE SYN file8 FOR 1002.file8 ;

Answer 1

使用awk你可以做这样的事情：

awk 'NF==0{print;next};/^SET CURRENT/{c=$4;print;next}!seen[c,$0]++' file

通过一些评论使其更具可读性：

awk ' NF == 0 {       # If we find an empty line
          print       # print the line
          next        # and skip to the next record
      }
      /^SET CURRENT/{ # If we find a line beginning wiith "SET CURRENT"
          c = $4      # Store the value in the 4th field
          print       # Print the current line
          next        # and skip to the next record  
      }
      !seen[c,$0]++  # Print if the combination of the "c" value
                      # and the current line has not been stored 
                      # in array "seen", and then store the
                      # combination in the array
                      # (in order to prevent other lines to be printed)
      ' file

!seen[c,$0]++的工作方式如下：当我们在数组索引中使用逗号时，这两个标记将组合成一个由SUBSEP字符连接的字符串。在这种情况下，我们使用c字符和当前行（$ 0）的组合作为索引，因为这是过滤后需要唯一的。使用!seen[c,$0]，我们检查组合是否作为数组的索引存在。如果索引不存在，则表达式的计算结果为true，这将导致打印的行。如果索引存在，则表达式的计算结果为false，并且不打印该行。使用post-fix increment运算符，我们计算索引的出现次数，这样只会在第一次出现时打印该行，但不会为后续匹配打印。

Answer 2

Start-Process $Gros\install.bat -WorkingDirectory $Gros -wait

使用sed或awk

2 个答案: