字符串之间的Unix打印模式

时间:2015-02-10 16:49:15

标签: unix awk sed pattern-matching

我有一个文件,其中包含如下内容。 STARTSTOP代表一个区块。

START
X | 123
Y | abc
Z | +=-
STOP
START
X | 456
Z | +%$
STOP
START
X | 789
Y | ghi
Z | !@#
STOP

我希望以下面的格式为每个块打印XY的值:

123 ~~ abc
456 ~~ 
789 ~~ ghi

如果单次出现START / STOPsed -n '/START/,/STOP/p'会有所帮助。由于这是重复的,我需要你的帮助。

3 个答案:

答案 0 :(得分:2)

对于任何涉及处理多行的问题,sed总是错误的选择。所有sed的神秘结构在1970年代中期发明时都已经过时了。

每当输入中有名称 - 值对时,我发现创建一个数组可以将每个名称映射到它的值,然后通过名称访问数组。在这种情况下,使用GNU awk进行多字符RS和删除数组:

$ cat tst.awk
BEGIN {
    RS = "\nSTOP\n"
    OFS=" ~~ "
}
{
    delete n2v
    for (i=2;i<=NF;i+=3) {
        n2v[$i] = $(i+2)
    }
    print n2v["X"], n2v["Y"]
}

$ gawk -f tst.awk file
123 ~~ abc
456 ~~ 
789 ~~ ghi

答案 1 :(得分:2)

基于我自己的How to select lines between two marker patterns which may occur multiple times with awk/sed解决方案:

awk -v OFS=" ~~ " '
       /START/{flag=1;next}
       /STOP/{flag=0; print first, second; first=second=""}
       flag && $1=="X" {first=$3}
       flag && $1=="Y" {second=$3}' file

测试

$ awk -v OFS=" ~~ " '/START/{flag=1;next}/STOP/{flag=0; print first, second; first=second=""} flag && $1=="X" {first=$3} flag && $1=="Y" {second=$3}' a
123 ~~ abc
456 ~~ 
789 ~~ ghi

答案 2 :(得分:1)

因为我喜欢脑筋急转弯(不是因为这种事情在sed中是可行的),所以可能的sed解决方案是

sed -n '/START/,/STOP/ { //!H; // { g; /^$/! { s/.*\nX | \([^\n]*\).*/\1 ~~/; ta; s/.*/~~/; :a G; s/\n.*Y | \([^\n]*\).*/ \1/; s/\n.*//; p; s/.*//; h } } }'

其工作原理如下:

/START/,/STOP/ {                        # between two start and stop lines
  //! H                                 # assemble the lines in the hold buffer
                                        # note that // repeats the previously
                                        # matched pattern, so // matches the
                                        # start and end lines, //! all others.

  // {                                  # At the end
    g                                   # That is: When it is one of the
    /^$/! {                             # boundary lines and the hold buffer
                                        # is not empty

      s/.*\nX | \([^\n]*\).*/\1 ~~/     # isolate the X value, append ~~

      ta                                # if there is no X value, just use ~~
      s/.*/~~/
      :a 

      G                                 # append the hold buffer to that
      s/\n.*Y | \([^\n]*\).*/ \1/       # and isolate the Y value so that
                                        # the pattern space contains X ~~ Y

      s/\n.*//                          # Cutting off everything after a newline
                                        # is important if there is no Y value
                                        # and the previous substitution did
                                        # nothing

      p                                 # print the result

      s/.*//                            # and make sure the hold buffer is
      h                                 # empty for the next block.
    }
  }
}