Question

我需要在一行上附加一个星号，但前提是所说的行在和之后是空行（FYI，表示空行中没有任何空格）。

假设我有以下文件：

foo

foo
foo

foo

foo

我希望输出看起来像这样：

foo

foo
foo

foo*

foo

我尝试修改以下awk命令（找到here）：

awk 'NR==1 {l=$0; next}
       /^$/ {gsub(/test/,"xxx", l)}
       {print l; l=$0}
       END {print l}' file

以适合我的用途，但所有人都被束缚了。

当然，也欢迎Sed或Perl解决方案！

更新

原来我问的问题不太正确。我真正需要的是将文本附加到非空白行的代码，这些行不以空格开头并且被跟随，两行向下，非空行也不以空格开头。

对于此修订版的问题，假设我有以下文件：

foo

third line foo

fifth line foo
 this line starts with a space foo
 this line starts with a space foo

ninth line foo

eleventh line foo

 this line starts with a space foo

last line foo

我希望输出看起来像这样：

foobar

third line foobar

fifth line foo
 this line starts with a space foo
 this line starts with a space foo

ninth line foobar

eleventh line foo

 this line starts with a space foo

last line foo

对于那个，这个sed one-liner可以解决问题：

sed '1N;N;/^[^[:space:]]/s/^\([^[:space:]].*\o\)\(\n\n[^[:space:]].*\)$/\1bar\2/;P;D' infile

感谢Benjamin W。下面的明确且内容丰富的答案，我能够将这个单行程拼凑在一起！

Answer 1

sed解决方案：

$ sed '1N;N;s/^\(\n.*\)\(\n\)$/\1*\2/;P;D' infile
foo

foo
foo

foo*

foo

N;P;D是通过将下一行添加到模式空间，然后打印并删除第一行来同时查看两行的惯用方法。

1N;N;P;D将其扩展为在模式空间中始终有三行，这就是我们想要的。

如果第一行和最后一行为空（^\n和\n$），则替换匹配，并将一个*附加到空行之间的行。

请注意，这也匹配并附加*也用于三行空行的第二行，这可能不是您想要的。为确保不会发生这种情况，第一个捕获组必须至少有一个非空白字符：

sed '1N;N;s/^\(\n[^[:space:]].*\)\(\n\)$/\1*\2/;P;D' infile

评论中的问题

如果上面的第二行以*开头，我们不可以附加abc吗？

示例输入文件：

foo

foo
abc

foo

foo

foo

foo

空行之间有三个foo，但第一行不应附加*，因为上面的第二行以abc开头。这可以按如下方式完成：

$ sed '1{N;N};N;/^abc/!s/^\(.*\n\n[^[:space:]].*\)\(\n\)$/\1*\2/;P;D' infile
foo

foo
abc

foo

foo*

foo*

foo

这样在模式空间中一次保留四个行，并且只有在模式空间不以abc开头时才进行替换：

1 {      # On the first line
    N    # Append next line to pattern space
    N    # ... again, so there are three lines in pattern space
}
N        # Append fourth line
/^abc/!  # If the pattern space does not start with abc...
    s/^\(.*\n\n[^[:space:]].*\)\(\n\)$/\1*\2/   # Append '*' to 3rd line in pattern space
P        # Print first line of pattern space
D        # Delete first line of pattern space, start next cycle

两个评论：

BSD sed需要额外的分号：1{N;N;}而不是1{N;N}。
如果文件的第一行和第三行为空，则第二行不会附加一个星号，因为我们只在模式空间中有四行时才开始检查。这可以通过在1{}块中添加额外的替换来解决：
```
1{N;N;s/^$\n[^[:space:]].*$$\n$$/\1*\2/}
```
（记住BSD sed的额外;），但是试图覆盖所有边缘情况会使sed更具可读性，尤其是在单行内：
```
sed '1{N;N;s/^$\n[^[:space:]].*$$\n$$/\1*\2/};N;/^abc/!s/^$.*\n\n[^[:space:]].*$$\n$$/\1*\2/;P;D' infile
```

Answer 2

考虑这些问题的一种方法是作为状态机。

start: state = 0

0: /* looking for a blank line */
   if (blank line) state = 1

1: /* leading blank line(s)
   if (not blank line) {
       nonblank = line
       state = 2
   }

2: /* saw non-blank line */
   if (blank line) {
       output noblank*
       state = 0
   } else {
       state = 1
   }

我们可以直接将它翻译成一个awk程序：

BEGIN {
        state = 0;                # start in state 0
}

state == 0 {                      # looking for a (leading) blank line
        print;
        if (length($0) == 0) {    #   found one
                state = 1;
                next;
        }
}

state == 1 {                      # have a leading blank line
        if (length($0) > 0) {     #   found a non-blank line
                saved = $0;       #     save it
                state = 2;
                next;
        } else {
                print;            # multiple leading blank lines (ok)
        }
}

state == 2 {                      # saw the non-blank line
        if (length($0) == 0) {    #   followed by a blank line
                print saved "*";  #     BINGO!
                state = 1;        # to the saw a blank-line state
        } else {                  # nope, consecutive non-blank lines
                print saved;      #   as-is
                state = 0;        # to the looking for a blank line state
        }
        print;
        next;
}

END {                             # cleanup, might have something saved to show
        if (state == 2) print saved;
}

这不是最短的方式，也不是最快的方式，但它可能是最简单易懂的。

修改

以下是Ed和我的方式的比较（参见他对上下文的回答中的评论）。我将OP的输入复制了一百万倍然后计算了运行时间：

# ls -l
total 22472
-rw-r--r--. 1 root root      111 Mar 13 18:16 ed.awk
-rw-r--r--. 1 root root 23000000 Mar 13 18:14 huge.in
-rw-r--r--. 1 root root      357 Mar 13 18:16 john.awk

# time awk -f john.awk < huge.in > /dev/null
2.934u 0.001s 0:02.95 99.3%     0+0k 112+0io 1pf+0w

# time awk -f ed.awk huge.in huge.in > /dev/null
14.217u 0.426s 0:14.65 99.8%    0+0k 272+0io 2pf+0w

他的版本花了大约5倍的时间，I / O的两倍，并且（在此输出中没有显示）占用了1400倍的内存。

来自Ed Morton的

编辑：对于我们这些不熟悉John上面使用的time命令的输出的人来说，这是使用GNU awk 4.1.3在cygwin / bash上的普通UNIX time程序的第3次调用结果：

$ wc -l huge.in
1000000 huge.in

$ time awk -f john.awk huge.in > /dev/null
real    0m1.264s
user    0m1.232s
sys     0m0.030s

$ time awk -f ed.awk huge.in huge.in > /dev/null
real    0m1.638s
user    0m1.575s
sys     0m0.030s

所以，如果您更愿意写3行而不是3行来处理一百万行文件，那么John的回答是正确的。

EDIT＃3

这是tcsh / csh内置的标准“时间”。即使你不认识它，输出也应该直观明显。是的，男孩和女孩，我的解决方案也可以写成一个简短的难以理解的混乱：

s == 0 { print; if (length($0) == 0) { s = 1; next; } }
s == 1 { if (length($0) > 0) { p = $0; s = 2; next; } else { print; } }
s == 2 { if (length($0) == 0) { print p "*"; s = 1; } else { print p; s = 0; } print; next; }
END { if (s == 2) print p; }

Answer 3

这是一个perl过滤器版本，为了说明 - 希望很清楚它是如何工作的。有可能编写一个具有较低输入输出延迟的版本（2行而不是3行），但我认为这不重要。

my @lines;

while (<>) {
    # Keep three lines in the buffer, print them as they fall out
    push @lines, $_;
    print shift @lines if @lines > 3;

    # If a non-empty line occurs between two empty lines...
    if (@lines == 3 && $lines[0] =~ /^$/ && $lines[2] =~ /^$/ && $lines[1] !~ /^$/) {
        # place an asterisk at the end
        $lines[1] =~ s/$/*/;
    }
}

# Flush the buffer at EOF
print @lines;

Answer 4

perl one-liner

perl -0777 -lne's/(?<=\n\n)(.*?)(\n\n)/$1\*$2/g; print' ol.txt

-0777＆＃34; slurps＆＃34;在整个文件中，分配给$_，在其上运行（全局）替换，然后print编辑。

重复模式(?<=text)需要lookbehind [empty][line][empty][line][empty]。它是一个＆＃34; 零宽度断言＆＃34;只检查模式是否存在而不消耗它。这样，该模式可用于下一场比赛。

这样的连续重复模式最初会发布/(\n\n)(.*?)(\n\n)/$1$2\*$3/，因为在刚刚匹配的情况下，下一个模式的开头不会考虑尾随\n\n。

Answer 5

更新：我的解决方案在连续两次匹配后也失败，如上所述，需要相同的回顾：s /（？＆lt; = \ n \ n）（\ w +）\ n \ n / \ 1 \ 2 * \ n \ N /毫克;

最简单的方法是使用多行匹配：

    local $/;     ## slurp mode
    $file = <DATA>;

    $file =~ s/\n\n(\w+)\n\n/\n\n\1*\n\n/mg;
    printf $file;

    __DATA__
    foo

    foo
    foo

    foo

    foo

Answer 6

这是最简单，最清晰的两次通过：

$ cat tst.awk
NR==FNR { nf[NR]=NF; nr=NR; next }
FNR>1 && FNR<nr && NF && !nf[FNR-1] && !nf[FNR+1] { $0 = $0 "*" }
{ print }

$ awk -f tst.awk file file
foo

foo
foo

foo*

foo

以上需要一次传递来记录每一行的字段数（NF为空行为零），然后第二次传递只检查您的要求 - 当前行不是文件中的第一行或最后一行，它不为空，之前和之后的行都是空的。

Answer 7

替代awk解决方案（单程）

$ awk 'NR>2 && !pp && !NF {p=p"*"} 
                      NR>1{print p} 
                          {pp=length(p);p=$0} 
                       END{print p}' foo       

foo                                                                                                                   

foo                                                                                                                   
foo                                                                                                                   

foo*                                                                                                                  

foo

说明：将打印推迟到下一行进行决策，因此需要保留p中的上一行和pp中第二行的状态（假设长度为零）是空的）。做簿记作业，最后打印最后一行。

附加到前面的行，后跟空行

7 个答案:

评论中的问题

修改

EDIT＃3