sed或awk脚本替换文本文件的结构

时间:2015-02-23 10:44:09

标签: regex shell awk sed

我想创建一个sed或awk脚本,在awk -f script.awk oldfile > newfile上将给定的文本文件oldfile转为内容

Some Heading
example text

Another Heading
1. example list item, but it
spans over multiple lines
2. list item

到新文本文件newfile,内容为:

{Some Heading:} {example text}

{Another Heading:} {
  [item] example list item, but it spans over multiple lines
  [item] list item
}

进一步说明消除可能的歧义

  • 脚本应相应地替换每个块(即行,用空行封装)。
  • 在文本文件中,可能会出现多个此类块,并且不清楚它们出现的顺序。
  • 脚本应该有条件地进行替换,具体取决于标题(即块的第一行)后面是否是项目列表(由“1”开头的行表示)。
  • 块总是用空行分隔。

如何使用sed或awk实现此目的? (如果这会产生影响,我会使用zsh。)


补充:我刚发现我确实需要事先知道该块是否是列表:

heading
1. foo
2. bar

{list: heading}{
 [item] foo
 [item] bar
}

所以我需要输入“list:”如果它是一个列表。这也可以吗?

3 个答案:

答案 0 :(得分:4)

使用awk,您可以执行以下操作:

awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" $0 ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n  [item] " $0; next } { block = block (list ? " " : "") $0 } END { print block (list ? "\n}" : "}") }' filename

代码是:

#!/usr/bin/awk -f

/^$/ {                               # empty line: print converted block
  print block (list ? "\n}" : "}")   # Whether there's a newline before the
  block = ""                         # closing } depends on whether this is
  next                               # a list. Reset block buffer.
}
block == "" {                        # in the first line of a block:
  block = "{" $0 ":} {"              # format header
  list = 0                           # reset list flag
  next
}
/^[0-9]+\. / {                       # if a data line opens a list
  list = 1                           # set list flag
  sub(/^[0-9]+\. /, "")              # remove number
  block = block "\n  [item] " $0     # format line
  next
}
{                                    # if it doesn't, just append it. Space
  block = block (list ? " " : "") $0 # inside a list to not fuse words.
}
END {                                # and at the very end, print the last
  print block (list ? "\n}" : "}")   # block
}

也可以使用sed,但更难阅读:

#!/bin/sed -nf

/^$/ {                       # empty line: print converted block
  x                          # fetch it from the hold buffer
  s/$/}/                     # append closing }
  /\n  \[item\]/ s/}$/\n}/   # in a list, put in a newline before it
  p                          # print
  d                          # and we're done here. Hold buffer is now empty.
}
x                            # otherwise: inspect the hold buffer
// {                         # if it is empty (reusing last regex)
  x                          # get back the pattern space
  s/.*/{&:}{/                # Format header
  h                          # hold it.
  d                          # we're done here.
}
x                            # otherwise, get back the pattern space
/^[0-9]\+\. / {              # if the line opens a list
  s///                       # remove the number (reusing regex)
  s/.*/  [item] &/           # format the line
  H                          # append it to the hold buffer.
  ${                         # if it is the last line
    s/.*/}/                  # append a closing bracket
    H                        # to the hold buffer
    x                        # swap it with the hold buffer
    p                        # and print that.
  }
  d                          # we're done.
}
                             # otherwise (not opening a list item)
H                            # append line to the hold buffer
x                            # fetch back the hold buffer to work on it

/\n  \[item\]/ {             # if we're in a list
  s/\(.*\)\n/\1 /            # replace the last newline (that we just put there)
                             # with a space
  ${
    s/$/\n}/                 # if this is the last line, append \n}
    p                        # and print
  }
  x                          # put the half-assembled block in the hold buffer
  d                          # and we're done
}
s/\(.*\)\n/\1/               # otherwise (not in a list): just remove the newline
${
  s/$/}/                     # if this is the last line, append closing bracket
  p                          # print
}
x                            # put half-assembled block in the hold buffer.

答案 1 :(得分:2)

sed是面向行的,因此最适合在一行上进行简单替换。

只需在段落模式(RS="")中使用awk,因此每个空白行分隔文本块都被视为记录,并将每个段落中的每一行视为记录的字段(FS="\n" ):

$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
    printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {", $1
    inList = 0
    for (i=2; i<=NF; i++) {
        if ( sub(/^[0-9]+\./,"  [item]",$i) ) {
            printf "\n"
            inList = 1
        }
        else if (inList) {
            printf " "
        }
        printf "%s", $i
    }
    print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}

{list: Another Heading} {
  [item] example list item, but it spans over multiple lines
  [item] list item
}

答案 2 :(得分:1)

另一个awk版本(类似于Eds)

BEGIN{RS="";FS="\n"}
{
    {printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")$1":} {"
    for(i=2;i<=NF;i++)
    printf "%s",sub(/^[0-9]+\./,"  [item]",$i)&&++x?"\n"$i:$i
    print x?"\n}":"}""\n"
    x=0
}

输出

$awk -f test.awk file

{Some Heading:} {example text}

{Another Heading:} {
  [item] example list item, but itspans over multiple lines
  [item] list item
}

如何运作

BEGIN{RS="";FS="\n"}

将记录读取为由空行分隔的块 将字段读为行。

{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")$1":} {"

以指定的格式打印第一个字段(行),注意printf用于省略换行符。 检查记录的任何部分是否包含换行符,然后检查数字和句点,如果有,则添加列表。

for(i=2;i<=NF;i++)

从第二个字段循环到最后一个字段。 NF是字段数。

我会将下一位分开。

printf "%s"

打印字符串,再次使用printf来控制换行符

sub(/^[0-9]+\./,"  [item]",$i)&&++x?"\n"$i:$i

这实际上是使用三元运算符a?b:c的if else语句。 如果无法完成,则sub将返回0,并且x将不会递增,因此该行将按原样打印 如果sub成功,它将用该行的[item]替换开头的数字,增加x并在其前面用换行符打印新行。

print x?"\n}":"}""\n"

再次使用三元运算符来检查x是否递增。如果在}之前打印了换行符,则只记录}。为记录之间的双换行打印换行符。