Question

我有一个ascii文件，其中包含以下内容：

START
this is my home
this is my pc

START
this is my linux
this is my awk
this is nice

START
this is a single line

START
this is my work
this is the end
this line has to be read

START
...
...

START
.
.
.
.

我想读取START和空白行之间的行，并以分隔格式打印输出。输出应该是以下格式：

this is my home;this is my pc
this is my linux;this is my awk;this is nice
this is a single line
this is my work;this is the end;this line has to be read

我使用分号作为分隔符。请注意：START和空行之间的行数不固定。

我尝试过使用awk，但是我在START

之后只能读取一行

awk 'BEGIN { RS = "START" } ; { print $1 }'

任何人都可以引导我到正确的论坛/正确的方向......

由于

Answer 1

你可以这样做：

awk -v RS="" '{$1=$1}1' file
START this is my home this is my pc
START this is my linux this is my awk this is nice
START this is a single line
START this is my work this is the end this line has to be read

要确保每个部分都包含START并将其删除：

awk -v RS="" '{$1=$1} /^START/ {gsub(/^START /,"");print}' file
this is my home this is my pc
this is my linux this is my awk this is nice
this is a single line
this is my work this is the end this line has to be read

为您提供有关awk失败原因的其他信息您需要在更改RS后重新构建每一行，方法是使用$1=$1
然后按1或{print $0}打印整行所以要让awk工作：

awk 'BEGIN { RS = "START" } {$1=$1} 1' file

或者像这样

awk -v RS="START" '{$1=$1} NR>1' file

NR>1会阻止第一个空白行表单成为打印机。

RS中的多个字符使得便携性降低，您需要gnu awk

Answer 2

$ awk -v RS= '{$1=$1} sub(/^START /,"")' file
this is my home this is my pc
this is my linux this is my awk this is nice
this is a single line
this is my work this is the end this line has to be read

Answer 3

这将构建一个包含输入文件相关部分的字符串，其中块由'\ n'分隔，行以';'分隔。

awk '
  t && $0 == "" { t = 0 ; sep = "\n" }
  t             { hold = hold sep $0 ; sep = ";" }
  $0 == "START" { t = 1 }
  END           { print hold }
' file

第一行处理块的结束。

如果设置了块内触发器，则第二行会将一个分隔符（适当的“”，“\ n”或“;”）和当前记录附加到保持缓冲区。

第三行在块开始时设置触发器 - 如果块已经启动，则“START”行将被视为块的一部分。

Answer 4

接受的答案不会将每个行块中的各个行保留为单独的字段，而是在输出中用;分隔;以下是：

awk -v RS='' -F'\n' -v OFS=';' '{sub(/^START\n/,""); $1=$1; print }' file

RS=''（将输入记录分隔符RS设置为空字符串）是具有特殊含义的awk 成语：它将输入分解为基于空行作为分隔符的行;换句话说：每个连续的非空行块形成一个记录。
-F'\n'将输入字段分隔符（也可作为特殊变量FS访问）设置为换行符，以便每条记录中的每行（行块）将成为自己的字段。
OFS=';'根据OP的要求将输出字段分隔符设置为;。
sub(/^START\n/,"")从每条记录（行块）中删除START行（加上其尾随换行符）。
$1=$1是一个技巧，通过分配一个字段变量，使用OFS的值作为分隔符，使输入记录从其各个字段重建 ;在这里，各个行（没有它们的尾随换行符） - 与;有效地结合在一起形成单个行。
print只输出重建的记录。

在图案之间逐行读取，然后以非限制格式打印

4 个答案: