在开始和结束模式中查找和修改模式并更新文件

时间:2017-04-01 12:38:44

标签: regex awk gawk

我想在开始和结束模式中查找并修改模式以更新多个文件。 如果用awk / sed实现这一点,我正在分解这些步骤。

  1. 在'startpat'和'endpat'中查找字符串的出现(捕获开始和结束之间的行)
  2. 修改实例中的字符串,例如: 更新'sss:ccc'到'sss:ddd' 将'brr:mmm'更新为'brr:rel / ccc'
  3. 现在使用步骤2中的更新字符串创建一个从“startpat”到“endpat”的新行集。
  4. 在'---'后附加到文件的开头。
  5. 从'startpat'和'endpat'中删除最后一组出现的行,如果它与字符串'sss匹配:aaa'和'brr:rel / aaa'
  6. 注意:最重要的是要保留缩进,因为我正在使用json / yaml文件。

    输入文件格式(PS在解析文件时忽略注释行//):

    ---
     - startpat:        // Startpat - make this line inclusive
        ...
        sss: ccc        // pattern to be modified
        ppp: 'vvv'
        pname: 'vvv'
        brr: 'mmm'      // pattern to be modified
        jdk: jdk8
        jdks:
          - jdk8
          - jdk7
        file:
          - test:
              exec: 'input'
        ...
    
     - startpat:        // Endpat - make this line exclusive
    

    处理后的预期输出:

    ---
     - startpat:
        sss: ddd
        ppp: 'vvv'
        pname: 'vvv'
        brr: 'mmm'
        jdk: jdk8
        jdks:
          - jdk8
          - jdk7
        file:
          - test:
              exec: 'input'
    
     - startpat:        // Startpat
        ....
        sss: ccc
        ppp: 'vvv'
        pname: 'vvv'
        brr: 'mmm'
        jdk: jdk8
        jdks:
          - jdk8
          - jdk7
        file:
          - test:
              exec: 'input'
        ...
    
     - startpat:        // Endpat
    

1 个答案:

答案 0 :(得分:2)

我认为最简单的方法是保存数组中的每一行。为了帮助您入门:

$ cat f.awk
BEGIN {
    # build regualr expressions to match "start pattern" and
    # "end pattern" (in the question they are the same)

    ws = "[\\t ]*"            # white-spaces
    sp = "^" ws "- startpat:" # [s]tart [p]attern
    ep = sp                   # [e]nd   [p]attern

    # a regular expression to match "---"
    # possibly suraunded by white-spaces
    op = "^" ws "---" ws "$" # where to start appending
}

{ f[NR] = $0 } # save every line to an array

END {
    n = NR # number of line in the file

    find_blocks() # set `nb` (number of blocks), `ss` `ee`

    for (ib = 1; ib <= nb; ib++)
        process_block(ss[ib], ee[ib]) # pass start and end of each block
                                      # set `nex' (number of extra lines) and `eex'
    write()
}

function find_blocks(   i, l, is, ie) {
    for (i = 1; i <= n; i++) {
        l = f[i]
        if (is > ie && l ~ ep) ee[++ie] = i # end
        if (           l ~ sp) ss[++is] = i # start
    }
    nb = ie
}

function process_block(is, ie,   i, l) {
    for (i = is + 1; i <= ie - 1; i++) {
        l = f[i]
        # modify a line (an example)
        if (l ~ /brr:/) sub(/'mmm'/, "'rel/cc'", l)

        eex[++nex] = l # push the line to another array
    }
}

function write(   i, j, l) {
    i = 1
    while (i <= n) { # print everything before "---"
        print l = f[i++]
        if (l ~ op) break
    }

    for (j = 1; j <= nex; j++) # add an extra part
        print eex[j]

    while (i <= n)            # print the part after "---"
        print f[i++]
}

输入文件

$ cat input
---
 - startpat:
    XXXXX
    brr: 'mmm'
 - startpat:
    YYYYY        
    brr: 'mmm'
 - startpat:

用法:

awk -f f.awk input

输出:

---
    XXXXX
    brr: 'rel/cc'
    YYYYY        
    brr: 'rel/cc'
 - startpat:
    XXXXX
    brr: 'mmm'
 - startpat:
    YYYYY        
    brr: 'mmm'
 - startpat: