我想创建一个sed或awk脚本,在awk -f script.awk oldfile > newfile
上将给定的文本文件oldfile
转为内容
Some Heading
example text
Another Heading
1. example list item, but it
spans over multiple lines
2. list item
到新文本文件newfile
,内容为:
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but it spans over multiple lines
[item] list item
}
进一步说明消除可能的歧义:
如何使用sed或awk实现此目的? (如果这会产生影响,我会使用zsh。)
补充:我刚发现我确实需要事先知道该块是否是列表:
heading
1. foo
2. bar
到
{list: heading}{
[item] foo
[item] bar
}
所以我需要输入“list:”如果它是一个列表。这也可以吗?
答案 0 :(得分:4)
使用awk,您可以执行以下操作:
awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" $0 ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n [item] " $0; next } { block = block (list ? " " : "") $0 } END { print block (list ? "\n}" : "}") }' filename
代码是:
#!/usr/bin/awk -f
/^$/ { # empty line: print converted block
print block (list ? "\n}" : "}") # Whether there's a newline before the
block = "" # closing } depends on whether this is
next # a list. Reset block buffer.
}
block == "" { # in the first line of a block:
block = "{" $0 ":} {" # format header
list = 0 # reset list flag
next
}
/^[0-9]+\. / { # if a data line opens a list
list = 1 # set list flag
sub(/^[0-9]+\. /, "") # remove number
block = block "\n [item] " $0 # format line
next
}
{ # if it doesn't, just append it. Space
block = block (list ? " " : "") $0 # inside a list to not fuse words.
}
END { # and at the very end, print the last
print block (list ? "\n}" : "}") # block
}
也可以使用sed,但更难阅读:
#!/bin/sed -nf
/^$/ { # empty line: print converted block
x # fetch it from the hold buffer
s/$/}/ # append closing }
/\n \[item\]/ s/}$/\n}/ # in a list, put in a newline before it
p # print
d # and we're done here. Hold buffer is now empty.
}
x # otherwise: inspect the hold buffer
// { # if it is empty (reusing last regex)
x # get back the pattern space
s/.*/{&:}{/ # Format header
h # hold it.
d # we're done here.
}
x # otherwise, get back the pattern space
/^[0-9]\+\. / { # if the line opens a list
s/// # remove the number (reusing regex)
s/.*/ [item] &/ # format the line
H # append it to the hold buffer.
${ # if it is the last line
s/.*/}/ # append a closing bracket
H # to the hold buffer
x # swap it with the hold buffer
p # and print that.
}
d # we're done.
}
# otherwise (not opening a list item)
H # append line to the hold buffer
x # fetch back the hold buffer to work on it
/\n \[item\]/ { # if we're in a list
s/\(.*\)\n/\1 / # replace the last newline (that we just put there)
# with a space
${
s/$/\n}/ # if this is the last line, append \n}
p # and print
}
x # put the half-assembled block in the hold buffer
d # and we're done
}
s/\(.*\)\n/\1/ # otherwise (not in a list): just remove the newline
${
s/$/}/ # if this is the last line, append closing bracket
p # print
}
x # put half-assembled block in the hold buffer.
答案 1 :(得分:2)
sed是面向行的,因此最适合在一行上进行简单替换。
只需在段落模式(RS=""
)中使用awk,因此每个空白行分隔文本块都被视为记录,并将每个段落中的每一行视为记录的字段(FS="\n"
):
$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {", $1
inList = 0
for (i=2; i<=NF; i++) {
if ( sub(/^[0-9]+\./," [item]",$i) ) {
printf "\n"
inList = 1
}
else if (inList) {
printf " "
}
printf "%s", $i
}
print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}
{list: Another Heading} {
[item] example list item, but it spans over multiple lines
[item] list item
}
答案 2 :(得分:1)
另一个awk版本(类似于Eds)
BEGIN{RS="";FS="\n"}
{
{printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")$1":} {"
for(i=2;i<=NF;i++)
printf "%s",sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
print x?"\n}":"}""\n"
x=0
}
$awk -f test.awk file
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but itspans over multiple lines
[item] list item
}
BEGIN{RS="";FS="\n"}
将记录读取为由空行分隔的块 将字段读为行。
{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")$1":} {"
以指定的格式打印第一个字段(行),注意printf用于省略换行符。 检查记录的任何部分是否包含换行符,然后检查数字和句点,如果有,则添加列表。
for(i=2;i<=NF;i++)
从第二个字段循环到最后一个字段。 NF
是字段数。
我会将下一位分开。
printf "%s"
打印字符串,再次使用printf来控制换行符
sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
这实际上是使用三元运算符a?b:c
的if else语句。
如果无法完成,则sub将返回0,并且x将不会递增,因此该行将按原样打印
如果sub成功,它将用该行的[item]
替换开头的数字,增加x并在其前面用换行符打印新行。
print x?"\n}":"}""\n"
再次使用三元运算符来检查x是否递增。如果在}
之前打印了换行符,则只记录}
。为记录之间的双换行打印换行符。