使用awk将匹配的行映射到逗号分隔列的文本到csv

时间:2017-09-02 06:39:33

标签: awk

如果我有一个文本文件,其记录以//分隔,标题包含单行或多行内容,例如

SOMETHING (single line content)
ATHING Lorem (single line content) 
THETHING (single line content)
THING (multi-line content)
ANOTHERTHING (single line content)
//
SOMETHING (single line content)
ATHING Lorem (single line content) 
THETHING (single line content)
THING (multi-line content)
ANOTHERTHING (single line content)

我想要打印: 1)匹配“ATHING”的行和2)匹配以THING开头直到下一个标题的多行行,以便我最终得到这个输出:

ATHING content, THING content (multi-line concatenated to single line)
ATHING content, THING content (multi-line concatenated to single line)

2 个答案:

答案 0 :(得分:2)

awk 解决方案:

示例testfile内容:

SOMETHING (single line content)
ATHING Lorem (single line content) 
THETHING (single line content)
THING (multi-line content)
some tetx
sdsdf text
ANOTHERTHING (single line content)
//
SOMETHING (single line content)
ATHING Lorem (single line content) 
THETHING (single line content)
THING (multi-line content)
text 
text
ANOTHERTHING (single line content)

工作:

awk -v th="^THING" '/^ATHING/{ printf "%s,",$0 }
       $0~th{ f=1 }
       f{ if ($0~/^[A-Z]/ && $0!~th){ f=0; print "" } else printf " %s",$0; }' testfile

输出:

ATHING Lorem (single line content) , THING (multi-line content) some tetx sdsdf text
ATHING Lorem (single line content) , THING (multi-line content) text  text

答案 1 :(得分:0)

BEGIN                   { OFS = ", " }

/^\/\// && line         { print line;
                          line = "";
                          getline;
                          next          }

NR > 1 && line          { line = line OFS $0 }
NR > 1 && !line         { line = $0 }

END                     { print line }

awk脚本将在line中构建每个输出行,并在适当时输出。

  • BEGIN块设置用于连接线的分隔符。
  • 当找到//分隔符并且在line中组装了一条线时,第二个块执行。它打印该行并重置该变量。它还跳过下一行输入(SOMETHING输入行),然后从脚本开始后继续下一个输入行。
  • 使用NR > 1,我们会跳过最初的SOMETHING行。如果line包含某些内容,则会将当前行附加到其中,否则我们只需将line设置为当前输入行。
  • 最后,输出为最终输入块组装的线。

对于给定的数据,这会产生:

$ awk -f script.awk file.in
ATHING Lorem (single line content), THETHING (single line content), THING (multi-line content), ANOTHERTHING (single line content)
ATHING Lorem (single line content), THETHING (single line content), THING (multi-line content), ANOTHERTHING (single line content)