Question

我想编辑下面给出的文件，关键字是第一个（0）/（1）/（2）等，最后一行是<STRING>。

如果任何两行以相同的数字开头并且其中包含[STRING]，则应保留第一行，其他应删除并在第一行的最后一行添加注释＆＃34; - - total_number_of_lines＆＃34;，如--- 2或--- 3或--- 4

请参阅以下示例以供参考。

(0) some text let it be [STRING]
(0) some text1 let it be
(1) some text2 let it be 
(1) some text3 let it be [STRING]
(1) some text4 let it be [STRING]
(1) some text5 let it be [STRING]
(1) some text6 let it be [STRING]
(1) some text7 let it be [XYZ]
(0) some text8 let it be [STRING]
(0) some text9 let it be
(1) some text10 let it be 
(1) some text11 let it be [STRING]
(1) some text12 let it be [STRING]
(2) some text13 let it be [STRING]
(2) some text14 let it be [STRING]
(2) some text15 let it be [STRING]
(3) some text16 let it be [ABC]
(3) some text17 let it be [STRING]
(3) some text18 let it be [STRING]
(1) some text19 let it be [STRING]
(1) some text20 let it be [STRING]
(1) some text21 let it be [STRING]
(1) some text22 let it be [STRING]
(1) some text23 let it be [DEF]

这需要编辑为：

(0) some text let it be [STRING]
(0) some text1 let it be
(1) some text2 let it be 
(1) some text3 let it be [STRING] --- 4
(1) some text7 let it be [XYZ]
(0) some text8 let it be [STRING]
(0) some text9 let it be
(1) some text10 let it be 
(1) some text11 let it be [STRING] --- 2
(2) some text13 let it be [STRING] --- 3
(3) some text16 let it be [ABC]
(3) some text17 let it be [STRING] --- 2
(1) some text19 let it be [STRING] --- 4 
(1) some text23 let it be [DEF]

有人有什么建议吗？更正了问题，以提供更清晰的要求。

Answer 1

解决这个问题的主要技巧是在处理开始时和处理结束时获得正确的行为。否则，它只是能够保留计数器并与前一行进行比较。以下是如何在Tcl中执行此操作：

# How to write out a line with an optional count; customise as necessary
proc writeLine {count line} {
     if {$count > 1} {
         puts "$line --- $count"
     } else {
         puts $line
     }
}

# Note that the prev variable is not set at this point

set count 0
while {[gets stdin line] >= 0} {
    # Extract the parts we care about
    if {[regexp {^(\(\d+\)).*(\[[^][]+\])$} $line -> a b]} {
        set AB $a$b
        if {[info exist prev] && $prevAB ne $AB} {
            writeLine $count $prev
            set count 0
            set prev $line
        } elseif {![info exist prev]} {
            set prev $line
        }
        set prevAB $AB
        incr count
    } else {
        # Unmatched line; flush and print
        if {[info exist prev]} {
            writeLine $count $prev
        }
        writeLine 1 $line
        set count 0
        unset -nocomplain prev prevAB
    }
}
# Print out the final line if necessary
if {[info exist prev]} {
    writeLine $count $prev
}

Answer 2

编辑：对于更改的要求，可以将此方法修改为

awk 'function tok() { return $0 ~ /\[STRING\]/ ? $1 : "" } function reset() { lastline = $0; prev = tok(); ctr = 1 } function commit() { print lastline (ctr == 1 ? "" : " --- " ctr); reset() } NR == 1 { reset(); next } !tok() || prev != tok() { commit(); next } { ++ctr } END { commit(); }'

一般方法是在写作之前读取一行代码。在结束之后打印块，包括仅由一行组成的块。代码的工作原理如下：

# Token for repetition detection: Lines that contain [STRING] are exempt,
# so for them we report an empty / no token.
function tok() {
  return $0 ~ /\[STRING\]/ ? $1 : ""
}

# reset counters etc. when a new block begins
function reset() {
  lastline = $0
  prev = tok()
  ctr = 1
}

# Write saved line, with counter if appropriate
function commit() {
  print lastline (ctr == 1 ? "" : " --- " ctr)
  reset()
}

# We write every block after it is over, and this includes single lines.
# So: First line, just prime the pump, do nothing else.
NR == 1 {
  reset()
  next
}

# If the new line is exempt (no token reported) or the token changed,
# print stuff, reprime pump.
!tok() || prev != tok() { 
  commit()
  next
}

# otherwise increase counter
{
  ++ctr
}

# and in the end, handle the last block.
END {
  commit()
}

Answer 3

这符合您所说的要求（If any two lines start with same number and has [STRING] in it then first line only should be kept, other should be deleted and append a comment at last of first line with "--- total_number_of_lines", as --- 2 or --- 3 or --- 4）：

$ cat tst.awk
NR==FNR { if (/\[STRING\]$/) cnt[$1]++; next }
/\[STRING\]$/ {
    if (seen[$1]++) next
    else $0 = $0 " --- " cnt[$1]
}
1

$ awk -f tst.awk file file
(0) some text let it be [STRING] --- 2
(0) some text1 let it be
(1) some text2 let it be
(1) some text3 let it be [STRING] --- 10
(1) some text7 let it be [XYZ]
(0) some text9 let it be
(1) some text10 let it be
(2) some text13 let it be [STRING] --- 3
(3) some text16 let it be [ABC]
(3) some text17 let it be [STRING] --- 2
(1) some text23 let it be [DEF]

但很明显，这与您的预期输出不符，因为您的预期输出不符合您的规定要求。

Answer 4

另一种方法是使用unix“uniq”命令。在MacOS（BSD）上，它是：

uniq -c -s20

如果要计算相同的行数，请排除比较中的前20个字符。这将把计数放在前面。您可以使用以下命令将计数移至末尾：

uniq -c -s20 | sed -E 's/^ *([0-9]+) (.*)/\2 --- \1/g'

在unbuntu上，它是sed -r，而不是sed -E。

如果匹配一个字符串，如何将类似的行合并为一个？

4 个答案: