Question

我很难理解如何使用awk实现我想要的东西，经过一段时间的搜索后，我找不到我正在寻找的解决方案。

我有一个如下所示的输入文本：

Some text (possibly containing text within parenthesis).
Some other text
Another line (with something here) with some text
 (
Element 4
)
Another line
 (
Element 1, span 1 to 
Element 5, span 4
)
Another Line

我想正确格式化'（'和'）'之间的怪异线条。预期产出如下：

Some text (possibly containing text within parenthesis).
Some other text
Another line (with something here) with some text
(Element 4)
Another line
(Element 1, span 1 to Element 5, span 4)
Another Line

查看堆栈溢出我发现了这个：
How to select lines between two marker patterns which may occur multiple times with awk/sed

所以我现在使用的是echo $text | awk '/ $/{flag=1;next}/$/{flag=0}flag'

除了过滤掉不匹配的行之外几乎可以工作，这是最后一个命令产生的输出：

(Element 4)
(Element 1, span 1 to Element 5, span 4)

任何人都知道如何做到这一点？我愿意接受任何建议，包括不使用awk如果你知道的更好。

如果您教我如何在我的问题代码块上删除语法着色，那么这是一个好处：）

感谢十亿次

编辑：好的，所以我接受了@ EdMorton的解决方案，因为他使用awk提供了一些东西（好吧，GNU awk）。但是，我目前正在使用@ aaron的sed voodoo咒语取得巨大成功，并且可能会继续这样做，直到我在该特定用例上发现任何新内容。

我强烈建议阅读EdMorton的解释，最后一段是我的一天。如果路过的人有很好的关于awk / sed的资源，他们可以分享，请在评论中随意这样做。

Answer 1

以下是我如何使用GNU sed执行此操作：

s/^\s*(/(/;/^(/{:l N;/)/b e;b l;:e s/\n//g}

对于那些不会说胡言乱语的人来说，意思是：

从以空格和左括号开头的行中删除前导空格
测试该行现在是否以左括号开头。如果是这种情况，请执行以下操作：
- 将此地点标记为标签l，表示循环的开始
- 从输入添加一行到模式空间
- 测试您的模式空间中是否有结束括号
- 如果是，请跳转到标签e
- （如果没有）跳转到标签l
- 将此地点标记为标签e，表示代码的结尾
- 从图案空间中删除换行符
（隐式打印模式空间，无论是否已被修改）

这可能会被改进，但它可以解决问题：

$ echo """Some text (possibly containing text within parenthesis).
Some other text
Another line (with something here) with some text
 (
Element 4
)
Another line
 (
Element 1, span 1 to
Element 5, span 4
)
Another Line """ | sed 's/^\s*(/(/;/^(/{:l N;/)/b e;b l;:e s/\n//g}'

Some text (possibly containing text within parenthesis).
Some other text
Another line (with something here) with some text
(Element 4)
Another line
(Element 1, span 1 to Element 5, span 4)
Another Line

编辑：如果您可以停用历史记录展开（set +H），则此sed命令会更好：s/^\s*(/(/;/^(/{:l N;/)/!b l;s/\n//g}

Answer 2

sed用于单个行上的简单替换，即全部。如果你试图用它做任何其他事情，那么你使用的结构在20世纪70年代中期发明时就已经过时了，当时发明了awk，几乎肯定是非便携式和效率低下的，总是只是一堆无法辨认的奥术符文，并且今天使用只是为了进行心理锻炼。

以下使用GNU awk进行多字符RS，RT和\s [[:space:]]简写，只需隔离(...)字符串，然后随意做任何事情：

$ cat tst.awk
BEGIN {
    RS="[(][^)]+[)]"             # a regexp for the string you want to isolate in RT
    ORS=""                       # disable appending of newlines so we print as-is
}
{
    gsub(/\n[[:blank:]]+$/,"\n") # remove any blanks before RT at the start of each line

    sub(/\(\s+/,"(",RT)          # remove spaces after ( in RT
    sub(/\s+\)/,")",RT)          # remove spaces before ) in RT
    gsub(/\s+/," ",RT)           # compress each chain of spaces to one blank char in RT

    print $0 RT                  # print the result
}

$ awk -f tst.awk file
Some text (possibly containing text within parenthesis).
Some other text
Another line (with something here) with some text
(Element 4)
Another line
(Element 1, span 1 to Element 5, span 4)
Another Line

如果您正在考虑使用sed解决方案，请考虑如果/当您有最轻微的要求更改时如何增强它。对上述awk代码的任何更改都是微不足道和明显的，同时更改等效的sed代码需要先在血月下牺牲一只山羊然后打破你的Rosetta Stone副本......

Answer 3

使用awk

$ cat fmt.awk
function rem_wsp(s) { # remove white spaces
    gsub(/[\t ]/, "", s)
    return s
}

function beg() {return rem_wsp($0)=="("}
function end() {return rem_wsp($0)==")"}
function dump_block() {
    print "(" block ")"
}

beg() {
    in_block = 1
    next
}

end() {
    dump_block()
    in_block = block = ""
    next
}

in_block {
    if (length(block)>0) sep = " "
    block = block sep $0
    next
}

{
    print
}

END {
    if (in_block) dump_block()
}

用法：

$ awk -f fmt.awk fime.dat

Answer 4

这在awk中是可行的，也许有一种比这更流畅的方式。它查找包含仅包含空格和开括号或右括号的行之间的行，并专门处理它们。它打印的其他所有东西：

awk '/^ *\( *$/,/^ *\) *$/ {
        sub(/^ */, "");
        sub(/ *$/, "");
        if ($1 ~ /[()]/) hold = hold $1; else hold = hold " " $0
        if ($0 ~ /\)/) {
            sub(/\( /, "(", hold)
            sub(/ \)/, ")", hold)
            print hold
            hold = ""
        }
        next
     }
     { print }' data

变量hold最初为空。第一对sub调用剥离前导空格和尾随空格（复制问题中的数据，span 1 to后面有空白）。 if将(或)添加到hold没有空格，或者在空格后添加到hold的行。如果存在右括号，请在打开括号后和近括号之前删除空格，打印hold，然后将hold重置为空。始终使用next跳过脚本的其余部分。脚本的其余部分为{ print } - 无条件打印，通常由极简主义者编写1。

文件data是来自问题数据的'copy'n'paste。

输出：

Some text (possibly containing text within parenthesis).
Some other text
Another line (with something here) with some text
(Element 4)
Another line
(Element 1, span 1 to Element 5, span 4)
Another Line

“另一条线”（大写字母L）的尾随空格，因为问题中的数据有。

使用awk格式化文本

4 个答案: